Vacancy expired!
- Responsible for operational support and on-call rotation shifts for supported systems and products. Accountable for security protections and implementing the least privilege model. Develop, Implement and maintain monitoring solutions for our cloud infrastructure. Identify, evaluate, and recommend monitoring tools and diagnostic techniques to improve system observability.
- Manage and create existing and new environments. Maintain IaC (Infrastructure as Code) solutions for teams using CloudFormation, Terraform, and/or CDK stacks. Work with Modeling, Development, and Data organizations to collaborate and recommend appropriate systems configurations and architectures. Collaborates with Developers, Business Analysis, Quality Assurance, and System Administrators at various stages of the Software Development Lifecycle.
- Participate in system design consulting, platform management, capacity planning, and launch reviews. Streamline the software development lifecycle by identifying pain points and productivity barriers and determining ways to resolve them
- Research security standards/tools; review or conduct system security and vulnerability assessments of cloud environments. Identify best practices to harden and secure our platform and networks including containers and Kubernetes clusters at scale
- Participate in communities of practice to share knowledge and foster continuous improvement. Other duties as requested.
- Perform other duties as assigned or apparent.
- Minimum of 5 years’ experience in a Cloud Operations and management role or related position
- Minimum of 3 years of Python and Shell scripting
- 3+ years of hands-on experience with multiple technology areas like API, Microservices, Event Streaming, Logging & Monitoring, Databases (SQL, NoSQL), Containers, Serverless Frameworks, AI & ML, etc.), API Platform (Apigee), Kafka, AMQ
- Perform analytics on previous incidents to understand root causes and better predict and prevent future issues
- Collaborate closely with development teams to understand their current build and release processes and make recommendations for improvement
- Ability to design and implement a hybrid architecture using key Azure(required)/ AWS(Preferred) technologies as well as a continuous integration and deployment process
- Extensive experience with open-source technology, software development, and system engineering.
- Agile/Scrum development
- MLOps experience is preferred
- Experience in communicating with users, other technical teams, and senior management to collect requirements, describe software product features, technical designs, and product strategy
- Proven track record of strong analytical and troubleshooting skills
- Ability to design and implement a hybrid architecture using key Azure/AWS/ technologies as well as a continuous integration and deployment process
- Experience with working in cloud ecosystems.
- Experience with incident and response management
- Excellent Interpersonal, verbal and writing skills
- Able to accept & adapt to changing priorities.
- Ability to work independently or as part of a team, manage multiple tasks concurrently and meet deadlines.
- Work closely with both the business and development teams globally in resolving issues