System admin

25 Jan 2025

Vacancy expired!

Role: System Administrator

Duration: 6 months likely to extend

Location: Santa Clara, CA Must work onsite dailyWe are looking for System Administrators (Lab Admins) with experience working in a data center and who are comfortable with both the programming piece as well as the physical aspects of swapping components, reseating GPUs, etc. These roles will be responsible for operating a lab of DGX servers that are in the initial development phase so the servers are unstable and need a lot of manual intervention.

Skillset
  • Unix administration for data center server systems (understanding of how to remotely operate and triage issues using iLO/ipmi/BMC)
  • Understanding of network configuration for systems in a data center
  • Exposure to updating firmwares for servers / perform maintenance tasks
  • Scripting knowledge (shell scripting)
  • Collection of logs, basic troubleshooting knowledge for servers

Nice to have
  • Basic VM configuration knowledge
  • Ansible, python exposure
  • Configuration / maintenance of DHCP server

Job Requirements
  • Install OS on DGX machines
  • Update firmware
  • Recover firmware with special flashing tools/instructions
  • Write/maintain monitoring scripts to make sure all machines are up and running
  • Basic triaging of issues with systems, re-seating CPU and GPU trays
  • Swapping of main components, for more involved reworks or updates, we will take the help of the HW team
  • Keep track of machine configurations, current state, maintenance downtimes,
  • Work with the hardware team in coordinating and resolving system issues, upgrades, or reworks
  • Work with the data center team in planning and execution of system installation and removing systems from the rack when required.

  • ID: #48810654
  • State: California Santaclara 95050 Santaclara USA
  • City: Santaclara
  • Salary: Depends on Experience
  • Job type: Contract
  • Showed: 2023-01-25
  • Deadline: 2023-03-12
  • Category: Et cetera