Vacancy expired!
- Support operations by providing a resilient, highly available, and performing platform resulting in exceeding established metrics, customer requirements and financial objectives.
- Partner with Product and Business team members to ensure high quality product is developed and released into production.
- Work closely with Architecture, Customers, Operations, and Product to specify and document solutions and practices.
- Promote a Site Reliability Engineering (SRE) and DevOps culture to enhance and operate cloud platform offerings to the enterprise while working toward innovation, automation, and operational excellence.
- Participate in design reviews of architecture patterns for service/application deployment in cloud (AWS).
- Collaborate with platform pillar leads in core platform, container orchestration, monitoring, and databases to build-out the platform.
- Automate for efficiency and reliability and utilize CI/CD tooling as appropriate.
- Participate in platform administration and Operations including troubleshooting of Complex issues within the Cloud environment.
- Participate in On Call support as part of Operations Support.
- Work with Product Owners to understand the desired capability, to define and prioritize work, determine deliverables, and manage workloads.
- Deliver and maintain standard operating procedures and assist in troubleshooting issues.
- Participate in on-call activities as needed for major incident resolution and problem management.
- 8+ years of overall information technology experience with a 2+ year emphasis on integration and delivery of public cloud.
- 3+ years of experience with AWS core services like EC2, S3, Lambda, EMR, IAM cloud formation, Terraform, or similar tools.
- 4+ years of experience with Kubernetes and Containerization Technologies (OpenShift, Kubernetes, AWS (ECS/EKS) etc.) at scale is must.
- Experience with Python, Ansible and shell scripting to automate routine operation tasks.
- Experience in ETL tools like Snowflake, Dremio, Talend, Attunity is good to have.
- Ability to provide 24x7 operational support on periodic basis and triage complex issues to restore availability is must.
- Proactive in nature with customer satisfaction as primary goal is must.
- Documentation of Engineering Solutions and Designs.
- Keen focus on automation is must.
- Ability to communicate clearly, effectively, persuasively with technology and business partners.
- Excellent collaborator and fantastic teammate.
- Able to deliver and apply risk-based approach to prioritize work.
- Ability to quickly comprehend the functions and capabilities of new technologies, and identify process improvements and efficiencies opportunities.
- Ability to adapt to change while continuing to deliver on assigned objectives.
- Works under minimal supervision.
- Able to provide technical guidance to the team.
- Strong written and oral communications skills.