Site Reliability Engineer

01 Jul 2024

Vacancy expired!

Job Number: 238251

Site Reliability Engineer

Job Description

IDEMIA is at the forefront of providing the next generation identification and authentication products and solutions. They include mobile driver's licenses, mobile IDs and identity proofing. Our products and solutions are delivered as a SaaS running on AWS's GovCloud IaaS.

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. According to Ben Treynor, founder of Google's Site Reliability Team, SRE is "what happens when a software engineer is tasked with what used to be called operations."

A Site Reliability Engineer (SRE) will spend up to 50% of their time doing "ops" related work such as investigating and troubleshooting issues, incident response and maintaining playbooks and other relevant documentation. Since the system that an SRE oversees is expected to be highly available and self-healing, the SRE should spend the other 50% of their time on development tasks such as improving CI and deployment pipelines, enhancing monitoring capabilities and keeping systems updated. The ideal Site Reliability Engineer candidate is either a software engineer with a good administration background or a highly skilled system administrator with knowledge of deployment automation, coding and devops. You will be reponsible for the following:

Ownership of product KPIs and SLA reporting.

Availability and performance of production services.

Deployment of upgrades and installation of new patches.

Troubleshooting, error logs analysis, reports generation, capacity planning etc.

Management of automated deployments into production and lower environments.

Required Skills:

Log aggregation, reporting and monitoring with the ELK Stack and Grafana

CI/CD automation and orchestration with Kubernetes, EKS, Helm, and Ansible.

Experience with Unix/Linux operating systems, CLI and administration.

Experience in production environments supporting mission-critical applications.

Working knowledge of Java, JVM management and configuration.

Strong communication skills with the ability to articulate technical details to different audiences.

Pluses:

Knowledge and experience designing and developing applications that take into account scalability, reliability, extensibility, etc.

Test automation experience with either unit/integration or functional API testing harnessed in a continuous delivery tool.

Experience

Minimum 3 years of experience supporting cloud-based, highly available solutions.

Minimum 5 years experience working in SRE, DevOps or Software engineering.

BS/MS in Computer Science, Mathematics, Engineering or equivalent experience.

THIRD PARTY AGENCIES, SUBCONTRACTORS, AND RECRUITERS NEED NOT APPLY. Applicants received from firms will not be considered. Subcontracting is not available for this position.

  • ID: #43740872
  • State: Virginia Reston 20190 Reston USA
  • City: Reston
  • Salary: USD TBD TBD
  • Job type: Permanent
  • Showed: 2022-07-01
  • Deadline: 2022-08-30
  • Category: Et cetera