Vacancy expired!
- Primary responsibility: supports and administers on-premises Linux resources, including HPC Linux clusters and scientific Linux desktops. 2 on-premise HPC clusters that we're looking for someone to assist with. 1 with 50 nodes and another with 8 nodes that's going to be refreshed at the end of the year. Looking for HPC expertise. Investigating issues that people report, assisting with application or software install on a cluster, monitoring of the clusters.
- Specific responsibilities include:
- Troubleshoot and resolve hardware, Operating System and application issues in HPC and desktop environments
- Work with systems staff to enhance our Configuration Management infrastructure
- Evaluate performance impacts of planned operating system changes
- Update and expand existing systems monitoring capabilities
- Develop automation tools for systems administration
- Maintain account management procedures to support growing number of researchers and scientists
- Provide technical support to researchers using HPC resources, troubleshoot problems and develop appropriate computational strategies
- Consult and collaborate with scientist co-workers to determine best system configurations for applications
- Minimum of 5 years RedHat, CentOS and/or Ubuntu Linux system administration experience
- Minimum of 3 years scripting experience with Bash, Perl or Python
- Minimum 3 years HPC Linux cluster administration, including job queueing and resource management, parallel filesystems and network management
- Prior experience with configuration management tools, such as Ansible, Chef, Puppet
- Prior experience with physical Linux hardware
- NFS experience.
- Experience with HPC job queueing systems such as SLURM or PBS
- Experience with HPC management software such as Bright Cluster Manager
- Experience managing parallel and cluster file systems such as NFS, BeeGFS, GPFS, Weka, or Lustre
- Network management experience especially Infiniband
- Experience with GPU compute in HPC and desktop environments
- Experience with container technologies such as Docker and Singularity
- Environment module experience with tcl modules or lua modules
- Linux physical hardware, whether that be interacting with servers or workstations.
- On the smaller Cluster, we have Bright Cluster Manager that is also what the replacement Cluster will be using, so having this experience is a plus.
- Parallel file system knowledge.
- BeeGFS experience
- GPFS experience
- Lustre or Weka experience.
- GPUs experience.
- Docker or singularity experience.
- Environment Modules experience.
- Provide technical expertise to improve HPC cluster management, performance, and resiliency
- Analytical thinking and problem solving skills for effective troubleshooting and problem resolution
- Ability to work both independently and as part of the team; flexibility in dealing with assignments and in working on several projects simultaneously
- Ability to effectively communicate with people of diverse backgrounds and computer knowledge
- ID: #22910795
- State: Illinois Northchicago 60064 Northchicago USA
- City: Northchicago
- Salary: $55 - $85
- Job type: Contract
- Showed: 2021-11-17
- Deadline: 2021-12-17
- Category: Et cetera