Vacancy expired!
Job Title: Site Reliability Engineer Client: HCL America Location: Denver, CO (Hybrid Day 1) Duration: 6+ Months Job Summary: Objectives of this role
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
- Provide primary operational support and engineering for multiple large-scale distributed software applications
- Monitoring
- Responsible for ensuring that the underlying infrastructure is running smoothly and that systems and tools are working as expected.
- Monitor critical applications and services to minimize downtime and ensure their availability.
- Issue resolution
- Work closely with developers, especially when issues arise so they will collaborate with developers to help with troubleshooting and provide consultation when alerts are issued.
- Investigate and then resolve the issue in the event that a developer runs into a problem.
- RCA to avoid the same issue in future
- Cross team collaboration
- Work across different teams, mainly operations and development.
- Automation
-
- Monitoring
- Incident response
- Alerts
- Experience in working on the AWS and Kubernetes
- Understand DevOps
- Experience in CI and CD using GIT, Jenkins, Ansible, JIRA etc
- Experience on handling of servers like Weblogic, JBoss,Tomcat,Apache,; databases like MySQL
- Understand Agile methodologies and software development lifecycle process
- Have worked with Monitoring tools like NewRelic, Splunk, ServiceNow, etc.