Site Reliability Engineer job vacancy

Vacancy expired!

RESPONSIBILITIES

Establish monitoring, tracing, logging, and alerting for shared platforms
Define SLAs and SLOs and set up monitoring to ensure availability targets are being met
Develop tools and workflows utilizing engineering best practices, such as infrastructure as code and CI/CD, to promote reliability and availability
Collaborate with platform engineers and developers to improve operational stability and reliability

REQUIREMENTS

Bachelor's degree in computer science or related or equivalent experience
Proven work experience as a Site Reliability Engineer or in a similar role
Expert in infrastructure as code (Terraform, Docker, Helm)
Expert in monitoring tools such as DataDog or Dynatrace
Cloud experience, preferably Azure
Experience with container technologies - Docker and Kubernetes
Experience with configuration and administration of CI/CD pipelines, preferably using GitHub Actions
Capable of writing comprehensive technical documentation and diagrams
Working knowledge of bash and shell scripting
Understanding of end-to-end application development lifecycle from code commit to production deployment
Have DevOps, Reliability, and Security mindsets - understand production controls and change processes