High Performance Computing (HPC) System Engineer

28 Mar 2024

Vacancy expired!

Company Description

Join us and make YOUR mark on the World!

Are you interested in joining some of the brightest talent in the world to strengthen the United States' security? Come join Lawrence Livermore National Laboratory (LLNL) where our employees apply their expertise to create solutions for BIG ideas that make our world a better place.

We are committed to a diverse and equitable workforce with an inclusive culture that values and celebrates the diversity of our people, talents, ideas, experiences, and perspectives. This is essential to innovation and creativity for continued success of the Laboratory's mission.

Pay Range

$123,960 - $166,992 Annually for the SES.2 level $148,650 - $200,328 Annually for the SES.3 level

Please note that the pay range information is a general guideline only. Many factors are taken into consideration when setting starting pay including education, experience, the external labor market, and internal equity. Job Description We have an opening for a

High Performance Computing(HPC) System Engineer to support one of the largest supercomputer centers in the world. The selected candidate will work in a challenging and team-oriented environment supporting Livermore Computing's (LC) high performance computing clusters. You will apply fundamental knowledge of HPC systems and contribute to technical projects using creativity and imagination. The position requires the ability to serve periodically on a rotating off-hours on-call list. This position is in the Livermore Computing Division within the Computation Directorate.

This position will be filled at either the SES.2 or SES.3 level based on knowledge and related experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level.

In this roleyou will
  • Provide system administration support for Linux-based HPC, Network Attached Storage (NAS) systems, Infrastructure and Parallel file systems servers and clusters.
  • Participate in the design and implementation of multiple Linux-based HPC, Infrastructure and Parallel file system servers and clusters.
  • Build, configure, and maintain multiple RAID controllers and disk enclosures systems.
  • Deploy and maintain high-speed cluster fabrics for compute and storage networks.
  • Monitor and conduct installations of software releases, patches of the operating system, and third-party utilities with emphasis on overall system security.
  • Improve the quality of service for end users, working with system engineers, Hotline, and Operations staff.
  • Troubleshoot and determine root cause of moderately complex system issues.
  • Respond to system problems and user questions in person, via email, and via a trouble ticket system.
  • Perform other duties as assigned.

Additional job responsibilities, at the SES.3 level
  • Analyze and tune performance of complex computer, network, file system and disk sub-systems.
  • Investigate, evaluate, test, and recommend technical solutions for future systems.
  • Develop tools and procedures to monitor and automate system tasks on servers and clusters.

Qualifications
  • Ability to secure and maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship
  • Bachelor's degree in computer science or related field or the equivalent combination of education and related experience.
  • Broad experience with Linux systems including installation, configuration, networking, backups, updates and patching, and system security.
  • Broad experience with or knowledge of HPC environments and technologies such as high-speed cluster fabrics (Infiniband), job scheduling (Slurm), and parallel file systems (Lustre and GPFS).
  • Comprehensive knowledge of scripting and programming languages, such as, Perl, Python, and bash/csh/ksh.
  • Proficient with disk and storage systems, such as host-based RAID controllers, software RAID and vendor RAID systems.
  • Comprehensive experience with version control and configuration management systems, such as, git, Ansible, and cfengine.
  • Demonstrated ability to work with limited direction in a dynamic environment with competing priorities.
  • Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).
  • Proficient communication, interpersonal skills, and the ability to work and communicate with other technical staff and end-users.

Additional qualifications at the SES.3 level
  • Significant experience with Linux system administration in support of several independent but inter-related systems and software packages, and knowledge of container technologies, Kubernetes, and other virtualization machine software environments.
  • Advanced knowledge of and significant experience providing innovative solutions to broadly defined tasks and problems.
  • Advanced communication, interpersonal skills, and the ability to effectively interact with system developers and vendors with minimal direction.

Qualifications We Desire
  • Master's degree in computer science or related field.
  • Experience with local, parallel and distributed file systems, such as, XFS, ZFS, GPFS, Lustre, and with NAS platforms, such as, NetApp FAS systems running OnTap 9.x.
  • Design and deployment experience with container technologies (singularity, docker, podman) and Kubernetes (OpenShift), and other virtualization environments, such as, KVM, and VMware ESXi 6.7/7.x.

Additional Information All your information will be kept confidential according to EEO guidelines.

Position Information

This is a Career Indefinite position, open to Lab employees and external candidates.

Why Lawrence Livermore National Laboratory?
  • Flexible Benefits Package
  • 401(k)
  • Relocation Assistance
  • Education Reimbursement Program
  • Flexible schedules (depending on project needs)
  • Inclusion, Diversity, Equity and Accountability (IDEA) - visit https://www.llnl.gov/diversity
  • Our core beliefs - visit https://www.llnl.gov/diversity/our-values
  • Employee engagement - visit https://www.llnl.gov/diversity/employee-engagement

Security Clearance

This position requires a Department of Energy (DOE) Q-level clearance.If you are selected, wewill initiate a Federal background investigation to determine if youmeet eligibility requirements for access to classified information or matter. Also, all L or Q cleared employees are subject to random drug testing. Q-level clearance requires U.S. citizenship.

Pre-Employment Drug Test

External applicant(s) selected for this position must pass a post-offer, pre-employment drug test. This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.

Equal Employment Opportunity

We are an equal opportunity employer that is committed to providing all with a work environment free of discrimination and harassment. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, pregnancy, protected veteran status, age, citizenship, or any other characteristic protected by applicable laws.

We invite you to review the Equal Employment Opportunity posters which include EEO is the Law and Pay Transparency Nondiscrimination Provision .

Reasonable Accommodation

Our goal is to create an accessible and inclusive experience for all candidates applying and interviewing at the Laboratory. If you need a reasonable accommodation during the application or the recruiting process, please use our online form to submit a request.

CaliforniaPrivacy Notice

The California Consumer Privacy Act (CCPA) grants privacy rights to all California residents. The law also entitlesjob applicants, employees, and non-employee workers to be notified of what personal information LLNL collects and for what purpose. The Employee Privacy Notice can be accessed here . Videos To Watch