Vacancy expired!
Company Description
Join us and make YOUR mark on the World!Are you interested in joining some of the brightest talent in the world to strengthen the United States' security? Come join Lawrence Livermore National Laboratory (LLNL) where our employees apply their expertise to create solutions for BIG ideas that make our world a better place.We are committed to a diverse and equitable workforce with an inclusive culture that values and celebrates the diversity of our people, talents, ideas, experiences, and perspectives. This is essential to innovation and creativity for continued success of the Laboratory's mission.Pay Range $123,960 - $166,992 Annually for the SES.2 level $148,650 - $200,328 Annually for the SES.3 levelPlease note that the pay range information is a general guideline only. Many factors are taken into consideration when setting starting pay including education, experience, the external labor market, and internal equity. Job Description We have an opening for a High Performance Computing(HPC) System Engineer to support one of the largest supercomputer centers in the world. The selected candidate will work in a challenging and team-oriented environment supporting Livermore Computing's (LC) high performance computing clusters. You will apply fundamental knowledge of HPC systems and contribute to technical projects using creativity and imagination. The position requires the ability to serve periodically on a rotating off-hours on-call list. This position is in the Livermore Computing Division within the Computation Directorate.This position will be filled at either the SES.2 or SES.3 level based on knowledge and related experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level.In this roleyou will- Provide system administration support for Linux-based HPC, Network Attached Storage (NAS) systems, Infrastructure and Parallel file systems servers and clusters.
- Participate in the design and implementation of multiple Linux-based HPC, Infrastructure and Parallel file system servers and clusters.
- Build, configure, and maintain multiple RAID controllers and disk enclosures systems.
- Deploy and maintain high-speed cluster fabrics for compute and storage networks.
- Monitor and conduct installations of software releases, patches of the operating system, and third-party utilities with emphasis on overall system security.
- Improve the quality of service for end users, working with system engineers, Hotline, and Operations staff.
- Troubleshoot and determine root cause of moderately complex system issues.
- Respond to system problems and user questions in person, via email, and via a trouble ticket system.
- Perform other duties as assigned.
- Analyze and tune performance of complex computer, network, file system and disk sub-systems.
- Investigate, evaluate, test, and recommend technical solutions for future systems.
- Develop tools and procedures to monitor and automate system tasks on servers and clusters.
- Ability to secure and maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship
- Bachelor's degree in computer science or related field or the equivalent combination of education and related experience.
- Broad experience with Linux systems including installation, configuration, networking, backups, updates and patching, and system security.
- Broad experience with or knowledge of HPC environments and technologies such as high-speed cluster fabrics (Infiniband), job scheduling (Slurm), and parallel file systems (Lustre and GPFS).
- Comprehensive knowledge of scripting and programming languages, such as, Perl, Python, and bash/csh/ksh.
- Proficient with disk and storage systems, such as host-based RAID controllers, software RAID and vendor RAID systems.
- Comprehensive experience with version control and configuration management systems, such as, git, Ansible, and cfengine.
- Demonstrated ability to work with limited direction in a dynamic environment with competing priorities.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).
- Proficient communication, interpersonal skills, and the ability to work and communicate with other technical staff and end-users.
- Significant experience with Linux system administration in support of several independent but inter-related systems and software packages, and knowledge of container technologies, Kubernetes, and other virtualization machine software environments.
- Advanced knowledge of and significant experience providing innovative solutions to broadly defined tasks and problems.
- Advanced communication, interpersonal skills, and the ability to effectively interact with system developers and vendors with minimal direction.
- Master's degree in computer science or related field.
- Experience with local, parallel and distributed file systems, such as, XFS, ZFS, GPFS, Lustre, and with NAS platforms, such as, NetApp FAS systems running OnTap 9.x.
- Design and deployment experience with container technologies (singularity, docker, podman) and Kubernetes (OpenShift), and other virtualization environments, such as, KVM, and VMware ESXi 6.7/7.x.
- Flexible Benefits Package
- 401(k)
- Relocation Assistance
- Education Reimbursement Program
- Flexible schedules (depending on project needs)
- Inclusion, Diversity, Equity and Accountability (IDEA) - visit https://www.llnl.gov/diversity
- Our core beliefs - visit https://www.llnl.gov/diversity/our-values
- Employee engagement - visit https://www.llnl.gov/diversity/employee-engagement
- ID: #49552600
- State: California Livermore 94550 Livermore USA
- City: Livermore
- Salary: USD TBD TBD
- Job type: Permanent
- Showed: 2023-03-26
- Deadline: 2023-05-24
- Category: Architect/engineer/CAD