Vacancy expired!
Your Opportunity
Our team is looking for an experienced Site Religability engineer who can lead multiple scrum teams at the same time. Ideal candidates must thrive in a fast-paced team environment and have a strong passion for technology and innovation. What you are good at- Develop and maintain tooling used for environment monitoring and task automation
- Identify application reliability and availability improvements and build solutions to drive an improved experience
- Analyze and establish efficient configurations for software and servers, DB connections, indexes, drivers, etc.
- Coordinate with development teams, technical and non-technical Partners, and clients to maintain wide knowledge on dependencies of the critical business transaction including platform, services and tools
- Monitor internal and vendor service level objectives (SLOs) and agreements (SLAs); identifies and resolves SLO / SLA gaps
- Serve as technical subject matter expert (SME) for cross-functional engineering Teams
- Assist with and troubleshoot systems-related issues and maintenance
- Collaborate on maintaining services once they are live, measures and monitors availability, latency, and overall system health
- Develop runbook and build automation
- Develop and maintain E2E monitoring dashboards to support critical business transaction
- Develop and maintain synthetic monitoring for critical business transaction using tools such as Thousand Eyes
- Practice sustainable incident response and blameless postmortems
- Document and promote SRE standards and procedures
- Develop and assist in deployment and rollback automation
- Review Release and deployments requirements
- Build and setup automation tests
- Incident communication to impacted stakeholders
- Coach and mentor junior engineers and fellow practitioners
- 5+ years of professional engineering experience developing, managing, or supporting distributed systems
- 2+ SRE experience managing multi-cloud platforms preferred
- Enterprise Cloud infrastructure experience e.g., AWS, Azure, Google Cloud Platform, Cloud Foundry
- Experience with microservices architecture patterns
- Proven track record of researching, understanding, and effectively applying Scalability and High Availability principles
- Experience in developing and managing operations leveraging key event streaming, messaging and DB services e.g., Casandra, MQ/JMS/Kafka, Aurora, RDS, Cloud SQL, BigTable, DynamoDB, Cloud Spanner, Kinesis, Cloud Pub/Sub, etc.
- Experience working with containers e.g., Docker, Kubernetes, Cloud Foundry, etc.
- Strong experience in using industry standard monitoring tools e.g., AppDynamics, Dynatrace, APICA, Splunk, ELK, FluentD, Prometheus, Kibana, Elasticsearch, Grafana, Nagios, Datadog, New Relic, Tempo, Loki, etc.
- Schwab systems experience
- Strong working knowledge of modern development technologies and tools e.g., Agile, CI/CD, Git, Jira and Confluence