Berkeley Research Computing. Town Hall Meeting Savio Overview (2024)

(1)

Berkeley Research Computing

Town Hall Meeting

(2)

SAVIO - The Need Has Been Stated

Inception and design was based on a specific need

articulated by Eliot Quataert and nine other faculty:

Dear Graham,

We are writing to propose that UC Berkeley adopt a

condominium computing model

, i.e., a more centralized

(3)

SAVIO - Condo Service Offering

Purchase into Savio by contributing

standardized compute hardware

● An alternative for running a cluster in a closet

with grad students and postdocs

● The condo trade-off:

○ Idle resources are made available to others

○ There are no (ZERO) operational costs for

administration, colocation, base storage, optimized

networking and access methods, and user services

Scheduler gives priority access to resources

(4)

SAVIO - Faculty Computing Allowance

Provides allocations to run on Savio as well as

support to researchers who have not purchased

Condo nodes

200k Service Units (core hours) annually

● More than just compute:

○ File systems

○ Training/support

○ User services

● PIs request their allocation via survey

● Early user access (based on readiness) now

● General availability planned for fall semester

(5)

SAVIO - System Overview

● Similar in design to a typical research cluster

○ Master Node role has been broken out

(management, scheduling, logins, file system, etc..)

● Home storage: Enterprise level, backups,

quotaed

● Scratch space: Large and fast (Lustre)

● Multiple login/interactive nodes

● DTN: Data Transfer Node

(6)(7)

SAVIO - Specification

● Hardware

○ Compute Nodes: 20-core, 64GB, InfiniBand

○ BigMem Nodes: 20-core, 512GB, InfiniBand

● Software Stack

○ Scientific Linux 6 (equivalent to Red Hat Enterprise

Linux 6)

○ Parallelization: OpenMPI, OpenMP, POSIX threads

○ Intel Compiler

○ SLURM job scheduler

(8)

SAVIO - OTP

● The biggest security threat that we encounter ...

STOLEN CREDENTIALS

● Credentials are stolen via keyboard sniffers installed on

researchers laptops or workstations, incorrectly

assumed to be secure

● OTP (One Time Passwords) offers mitigation

● Easy to learn, simple to use, and works on both

computers and smartphones!

(9)

SAVIO - Future Services

● Serial/HTC Jobs

○ Expanding the initial architecture beyond just HPC

○ Specialized node hardware (12-core, 128GB, PCI

flash storage)

○ Designed for jobs that use <= 1 node

○ Nodes are shared between jobs

● GPU nodes

○ GPUs are optimal for massively parallel algorithms

○ Specialized node hardware (8-core, 64GB, 2x Nvidia

(10)(11)

Berkeley Research Computing

Town Hall Meeting

(12)

SAVIO - Faculty Computing Allowance

● Eligibility requirements

○ ladder-rank faculty or PI on UCB campus.

○ In need of compute power to solve a research problem.

● Allowance Request Procedure

○ First fill out the Online Requirements Survey

○ Allowance can be used either by the faculty or by immediate group members.○ For additional cluster accounts fill out - Additional User Account Request Form

● Allowances

○ New allowances start on June 1st of every year.○ Mid-year requests are granted a prorated allocation

○ A cluster specific project (fc_projectname) with all user accounts is setup○ Scheduler account (fc_projectname) with 200K core hours is setup

(13)

SAVIO - Access

● Cluster access

○ Connect using SSH (server name - hpc.brc.berkeley.edu)

○ Uses OTP - One Time Passwords (Multifactor authentication) ○ Multiple login nodes (randomly distribute users)

● Coming in future

○ NERSC’s NEWT REST API for web portal development○ iPython notebooks & Jupyter hub integration

(14)

SAVIO - Data Storage Options

● Storage

○ No local storage on compute nodes○ All storage accessed over network○ Either NFS or Lustre protocol

● Multiple file systems

○ HOME - NFS, 10GB quota, Backed up, No purge.

○ SCRATCH - Lustre, No quota, No Backups, can be purged

○ Project (GROUP) space - NFS, 200GB quota, No Backups, No Purge.○ No long term archive.

(15)

SAVIO - Data Transfers

● Use only the dedicated Data Transfer Node (DTN)● Server name - dtn.brc.berkeley.edu

● Highly recommend using Globus (Web interface) for management ● Many other traditional tools are also supported on the DTN

○ SCP/SFTP

○ Rsync

(16)

SAVIO - Software Support

● Software module farm

○ Many of the most commonly used packages are already available.○ In most cases packages compiled from source

○ Easy command line tools to browse and access packages ($ module cmd)

● Supported package list

○ Open Source

■ Tools - octave, gnuplot, imagemagick, visit, qt, ncl, paraview, lz4, git, valgrind, etc..

■ Languages - GNU C/C++/Fortran compilers, Java (JRE), Python, R, etc..

○ Commercial

■ Intel C/C++/Fortran compiler suite, Matlab with 80 core license for MDCS

● User applications

○ Individual user/group specific packages can be built from source by users○ Recommend using GROUP storage space for sharing with others in group.○ SAVIO consultants available to answer your questions.

(17)

SAVIO - Job Scheduler

● SLURM

● Multiple Node Options (partitions)

● Interaction with Scheduler

○ Only with command line tools and utilities.

○ Online web interfaces for job management can be supported in future via NERSC’s NEWT REST API or iPython/Jupyter or both.

Quality of Service Max allowed running time/job Max number of nodes/job

savio_debug 30 minutes 4

savio_normal 72 hours (i.e 3 days) 24

Partition # of nodes # of cores/node Memory/node Local Storage

savio 160 20 64 GB No local storage

savio_bigmem 4 20 512 GB No local storage

(18)

SAVIO - Job Accounting

● Jobs gain exclusive access to assigned compute nodes.

● Jobs are expected to be highly parallel and capable of using all

the resources on assigned nodes.

For example:

● Running on one standard node for 5 hours uses 1 (nodes) * 20

(cores) * 5 (hours) = 100 core-hours (or Service Units).

(19)

● Online User Documentation

○ User Guide - http://research-it.berkeley.edu/services/high-performance-computing/user-guide

○ New User Information - http://research-it.berkeley.edu/services/high-performance-computing/new-user-information

● Helpdesk

○ Email : brc-hpc-help@lists.berkeley.edu

○ Monday - Friday, 9:00 am to 5:00 pm○ Best effort in non working hours

(20)

Thank you

Questions

New User Information -
Berkeley Research Computing. Town Hall Meeting Savio Overview (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Nathanael Baumbach

Last Updated:

Views: 5375

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Nathanael Baumbach

Birthday: 1998-12-02

Address: Apt. 829 751 Glover View, West Orlando, IN 22436

Phone: +901025288581

Job: Internal IT Coordinator

Hobby: Gunsmithing, Motor sports, Flying, Skiing, Hooping, Lego building, Ice skating

Introduction: My name is Nathanael Baumbach, I am a fantastic, nice, victorious, brave, healthy, cute, glorious person who loves writing and wants to share my knowledge and understanding with you.