Site Reliability Engineer (SRE)

20 Jun 2024

Vacancy expired!

Site Reliability Engineer (SRE)

(multiple openings)Location: Plano, TX (Need Day 1 onsite or within 1 or 2 months)

Mandatory Skills:Jenkins, Puppet, Dynatrace, AppDynamics, Kubernetes, monitoring tools, cloud, AWS, Java, microservice, Ubuntu(Linux), Maven, Grafana

Responsibilities:
  • Responsible for how code is deployed, configured, and monitored, as well as the availability, latency, change management, emergency response, and capacity management of services already in / going to
  • Design, code, test and deliver software to automate manual operational work, develop self-service, auto-detection and healing
  • Develop software for reliability and scale, ensuring minimal refactoring or changes
  • Define, monitor and defend SLOs
  • Deploying closed-loop remediation – continuous testing and remediation—to fix problems in pre-production before software is released to production.
  • Build custom tooling from scratch to meet specific needs in the incident management workflow.
  • Complex incident resolution across public cloud, private cloud, 3rd party, and on-premise tech.
  • Leverage Chaos Engineering to find and prevent future problems and to confirm fixes from past incidents function as intended.
  • Focus on end-user experiences and partner with development teams to implement changes to increase uptime and performance based on empirical evidence.
  • Troubleshoot priority incidents, facilitate blameless post-incident evaluations and ensure permanent closure of incidents
  • Identify application patterns and analytics in support of better service level objectives
  • Design performance tests, identify bottlenecks and opportunities for optimization and capacity demands, and present solutions for continuous improvements
  • Design best in class monitoring frameworks to accomplish end-to-end flow monitoring and noiseless alerting
  • Design automated software and product upgrades, change management and release management solutions

Skills/Qualifications
  • Bachelor’s degree or equivalent experience in a software engineering discipline
  • 2-3 years of SRE or System Engineering experience.
  • Expert in at least one technology stack designing, coding, testing, delivering software e.g., Java, Python, C, Go, etc.
  • Deep knowledge of Internet protocols and web services technologies e.g., HTTP, DNS, TCP/UDP, SOAP, JSON, Apache, Tomcat and REST
  • Experience working with containers e.g., Docker, Kubernetes, Cloud Foundry, etc.
  • Experience in working with automation tools e.g., Ansible, Puppet, Selenium etc.
  • In-Depth OS Experience e.g., RHEL, Ubuntu, Windows Server with strong debugging, troubleshooting, and problem-solving skills
  • Testing and build automation with a continuous integration/continuous delivery (CI/CD) pipeline e.g., Travis CI, Maven, Gradle, Groovy, Git, Terraform, Jenkins etc.
  • Experience deploying and managing services on modern platforms e.g., AWS, Google Cloud Platform, Azure.
  • Strong experience in using industry standard monitoring tools e.g., AppDynamics, Dynatrace, APICA, Splunk, ELK, FluentD, Prometheus, Kibana, Elasticsearch, Grafana, Nagios, Datadog, New Relic, etc.
  • Advanced understanding of application monitoring stack (Logs, Events Metrics & Alerts) and ability to visualize and setup end-to-end observability
  • Certified in one or more cloud technology e.g., AWS, Azure, Google Cloud Platform or RedHat is a big plus.

  • ID: #43358115
  • State: Texas Plano 75023 Plano USA
  • City: Plano
  • Salary: Depends on Experience
  • Job type: Contract
  • Showed: 2022-06-20
  • Deadline: 2022-08-16
  • Category: Et cetera