Berkeley Research Computing
Town Hall Meeting
SAVIO - The Need Has Been Stated
Inception and design was based on a specific need
articulated by Eliot Quataert and nine other faculty:
Dear Graham,
We are writing to propose that UC Berkeley adopt a
condominium computing model
, i.e., a more centralized
SAVIO - Condo Service Offering
●
Purchase into Savio by contributing
standardized compute hardware
● An alternative for running a cluster in a closet
with grad students and postdocs
● The condo trade-off:
○ Idle resources are made available to others
○ There are no (ZERO) operational costs for
administration, colocation, base storage, optimized
networking and access methods, and user services
●
Scheduler gives priority access to resources
SAVIO - Faculty Computing Allowance
●
Provides allocations to run on Savio as well as
support to researchers who have not purchased
Condo nodes
●
200k Service Units (core hours) annually
● More than just compute:
○ File systems
○ Training/support
○ User services
● PIs request their allocation via survey
● Early user access (based on readiness) now
● General availability planned for fall semester
SAVIO - System Overview
● Similar in design to a typical research cluster
○ Master Node role has been broken out
(management, scheduling, logins, file system, etc..)
● Home storage: Enterprise level, backups,
quotaed
● Scratch space: Large and fast (Lustre)
● Multiple login/interactive nodes
● DTN: Data Transfer Node
SAVIO - Specification
● Hardware
○ Compute Nodes: 20-core, 64GB, InfiniBand
○ BigMem Nodes: 20-core, 512GB, InfiniBand
● Software Stack
○ Scientific Linux 6 (equivalent to Red Hat Enterprise
Linux 6)
○ Parallelization: OpenMPI, OpenMP, POSIX threads
○ Intel Compiler
○ SLURM job scheduler
SAVIO - OTP
● The biggest security threat that we encounter ...
STOLEN CREDENTIALS
● Credentials are stolen via keyboard sniffers installed on
researchers laptops or workstations, incorrectly
assumed to be secure
● OTP (One Time Passwords) offers mitigation
● Easy to learn, simple to use, and works on both
computers and smartphones!
SAVIO - Future Services
● Serial/HTC Jobs
○ Expanding the initial architecture beyond just HPC
○ Specialized node hardware (12-core, 128GB, PCI
flash storage)
○ Designed for jobs that use <= 1 node
○ Nodes are shared between jobs
● GPU nodes
○ GPUs are optimal for massively parallel algorithms
○ Specialized node hardware (8-core, 64GB, 2x Nvidia
Berkeley Research Computing
Town Hall Meeting
SAVIO - Faculty Computing Allowance
● Eligibility requirements
○ ladder-rank faculty or PI on UCB campus.
○ In need of compute power to solve a research problem.
● Allowance Request Procedure
○ First fill out the Online Requirements Survey
○ Allowance can be used either by the faculty or by immediate group members.○ For additional cluster accounts fill out - Additional User Account Request Form
● Allowances
○ New allowances start on June 1st of every year.○ Mid-year requests are granted a prorated allocation
○ A cluster specific project (fc_projectname) with all user accounts is setup○ Scheduler account (fc_projectname) with 200K core hours is setup
SAVIO - Access
● Cluster access
○ Connect using SSH (server name - hpc.brc.berkeley.edu)
○ Uses OTP - One Time Passwords (Multifactor authentication) ○ Multiple login nodes (randomly distribute users)
● Coming in future
○ NERSC’s NEWT REST API for web portal development○ iPython notebooks & Jupyter hub integration
SAVIO - Data Storage Options
● Storage
○ No local storage on compute nodes○ All storage accessed over network○ Either NFS or Lustre protocol
● Multiple file systems
○ HOME - NFS, 10GB quota, Backed up, No purge.
○ SCRATCH - Lustre, No quota, No Backups, can be purged
○ Project (GROUP) space - NFS, 200GB quota, No Backups, No Purge.○ No long term archive.
SAVIO - Data Transfers
● Use only the dedicated Data Transfer Node (DTN)● Server name - dtn.brc.berkeley.edu
● Highly recommend using Globus (Web interface) for management ● Many other traditional tools are also supported on the DTN
○ SCP/SFTP
○ Rsync
SAVIO - Software Support
● Software module farm
○ Many of the most commonly used packages are already available.○ In most cases packages compiled from source
○ Easy command line tools to browse and access packages ($ module cmd)
● Supported package list
○ Open Source
■ Tools - octave, gnuplot, imagemagick, visit, qt, ncl, paraview, lz4, git, valgrind, etc..
■ Languages - GNU C/C++/Fortran compilers, Java (JRE), Python, R, etc..
○ Commercial
■ Intel C/C++/Fortran compiler suite, Matlab with 80 core license for MDCS
● User applications
○ Individual user/group specific packages can be built from source by users○ Recommend using GROUP storage space for sharing with others in group.○ SAVIO consultants available to answer your questions.
SAVIO - Job Scheduler
● SLURM
● Multiple Node Options (partitions)
● Interaction with Scheduler
○ Only with command line tools and utilities.
○ Online web interfaces for job management can be supported in future via NERSC’s NEWT REST API or iPython/Jupyter or both.
Quality of Service Max allowed running time/job Max number of nodes/job
savio_debug 30 minutes 4
savio_normal 72 hours (i.e 3 days) 24
Partition # of nodes # of cores/node Memory/node Local Storage
savio 160 20 64 GB No local storage
savio_bigmem 4 20 512 GB No local storage
SAVIO - Job Accounting
● Jobs gain exclusive access to assigned compute nodes.
● Jobs are expected to be highly parallel and capable of using all
the resources on assigned nodes.
For example:
● Running on one standard node for 5 hours uses 1 (nodes) * 20
(cores) * 5 (hours) = 100 core-hours (or Service Units).
● Online User Documentation
○ User Guide - http://research-it.berkeley.edu/services/high-performance-computing/user-guide
○ New User Information - http://research-it.berkeley.edu/services/high-performance-computing/new-user-information
● Helpdesk
○ Email : brc-hpc-help@lists.berkeley.edu
○ Monday - Friday, 9:00 am to 5:00 pm○ Best effort in non working hours
Thank you
Questions