- Bachelor's degree
- Master's degree
The San Diego Supercomputer Center (SDSC) is a world leader in using, innovating and providing cyberinfrastructure to enable advances and new discovery in science and engineering. Focusing on data-oriented and computational science and engineering applications, SDSC serves as an international resource for data cyberinfrastructure through the provision of software, hardware and human resources in multi-disciplinary science and engineering, and is a leading national cyberinfrastructure center to the National Science Foundation (NSF) and broader community.
SDSC’s High-Performance Systems Group is responsible for and operates SDSC’s high-performance computing clusters and related systems. The group operates large-scale compute and storage systems funded by the National Science Foundation (currently the XSEDE program), the UCSD campus (e.g., the Triton Shared Compute Cluster) and other entities; these systems support users from campus, national, and international communities across a broad range of scientific disciplines. The group is part of SDSC’s Data-Enabled Scientific Computing (DESC) Division.
The incumbent will apply skills as a seasoned, experienced systems integration professional with a full understanding of systems and software integration concepts to evaluate, resolve and implement medium-sized projects or portions of large projects with moderate scope and complexity. S/he will resolve a wide range of business processes, system functionality, implementation issues and system and software integration issues. The incumbent will need to demonstrate competency in selecting tools, methods and techniques to obtain results, give technical presentations to associated teams and other technical units, and evaluate new technologies including performing moderate to complex cost/benefit analyses. S/he may lead a team of systems/infrastructure professionals.
Additionally, the incumbent will be responsible for the management of national and campus HPC clusters and their related storage systems, such as large parallel file systems, NFS file servers, and the underlying storage technologies. Responsibilities include but are not limited to systems administration (primarily Linux) with on-call duties, including management of hardware, OS, I/O, and software environment installation and maintenance. The incumbent will support resource managers, schedulers and client access to parallel and distributed file systems, conduct multi-faceted analysis, testing, scripting and benchmarking, work with very complex, advanced systems, data and networks in a research and performance evaluation environment, and provide technical expertise in parallel and high-performance filesystems (Lustre, Ceph, GPFS, etc.) and storage. Also, s/he will be responsible for system internals, data and storage, network and operating systems, emerging technologies, hardware, and architectures and the interrelationship of all the foregoing and contribute to the design, installation, management and upgrade of very large HPC clusters, filesystems, data and storage resources.
The incumbent will work closely with other groups to integrate the HPC systems and storage into the SDSC networking, cloud, and user environments, collaborate on security procedure development and implementation, and provide support to the user services and scientific applications group. S/he will present at national meetings as necessary, work with the Operations group in training their staff and serve as liaison to the computational scientists, work on multiple problems or tasks that are not necessarily well defined and make recommendations that have an impact on an entire project or system, as well as provide advanced technical guidance to others at the same or lower level on an ongoing basis. The incumbent needs to work well in a group and collaborative setting, such as national projects like XSEDE and its constituent working groups and be able to exhibit effective communications skills in a professional manner.
For more information, please visit www.sdsc.edu
BA/BS degree in math or computer science or comparable combination of education and experience. Professional technical engineering or technical programming experience or Master's Degree preferred.
Advanced knowledge of systems integration and deploying moderately complex systems integration solutions. Specifically proven through experience administering large-scale HPC clusters and their related filesystems.
Demonstrated experience with large data storage arrays (more than100TB), and skill necessary to administer, maintain, monitor and upgrade.
Ability to install, maintain, upgrade, and troubleshoot large (petabyte scale) high performance parallel and distributed filesystems such as Luster, GPFS and Ceph
Strong knowledge of administering Linux systems, primarily Red Hat and its derivatives (e.g., CentOS).
Proven understanding of high speed interconnects used in HPC systems and storage such as Ethernet and Infiniband including knowledge of TCP/IP, VLANs, Pkeys, subnets and routing. Ability to use said knowledge to integrate HPC resources into data center network.
Job offer is contingent upon a satisfactory clearance based on background check results.
Occasional evenings and weekends may be required. Overtime and weekends may be required.