Vacancy expired!
This position is eligible to work in a hybrid work model (combination of in-office and remote days).
JOB SCOPEWe are currently looking for a Site Reliability Engineer IV to join our Observability as a Service team. The team is tasked with managing applications and platforms that are focused on data routing, data collection, data transformation and data distribution/delivery that provide log analysis and observability capabilities to the larger organization. The Observability as a Service team also provides infrastructure, implementation and upgrade support for the platforms, data transport and data stream management/deliveries, and all associated integrations for these platforms. The team responds to and troubleshoots all tickets and issues originating from telemetry, support tools and internal customers for all products that the team maintains. The team maintains heightened scrutiny of these services through several means to provide troubleshooting and resolution with issues and outages in an operations environment.DUTIES AND RESPONSBILITIES- Design, implement and maintain complex data collection, data transformation, data visualization and observability platforms at Enterprise scale to be used by internal operations teams.
- Integrating platforms with a wide variety of data sources and repositories that use various protocols.
- Manage issues and tasks using Kanban/Scrum and Agile methodologies in the JIRA/Confluence platforms.
- Develop and maintain CI/CD, automation and development technologies using Ansible, Terraform, Jenkins, Git, Bitbucket, Docker, Kubernetes and Rancher and other similar configuration management, automation or deployment tools.
- Manage bare metal, VM and containerized infrastructures.
- Provide regular KPI and metrics data on platform health and governance to leadership.
- Leverage automation and scripting for daily tasks, reporting, monitoring, data collection, data analysis and break/fix actions
- Write highly detailed procedures and other technical documentation to be referenced by technical peers and senior leadership in a knowledge base.
- Work in a team environment and work effectively with people with diverse technical skills and backgrounds.
- Work with the team to support, improve performance and architect the overall Observability platform.
- Advanced experience working with Unix and Linux operating systems; specifically troubleshooting and providing administration for both virtualized and bare metal implementations.
- Ability to allocate 40% of time to development and scripting.
- A passion for analytics, for helping users/clients and share the stories and meaning in their data.
- Experience in translating business requirements into concrete data analytic and observability solutions.
- Be a mentor and source of advanced knowledge for more junior team members.
- Have very advanced problem solving, critical thinking and analytical skills.
- In depth knowledge of and experience with development tools, application frameworks, and testing tools.
- Experience in translating business requirements into concrete data analytic solutions.
- Bachelor's degree (BA/BS) from four-year college or university; or equivalent training, education, and experience.
- Minimum of six (6) years of experience within Systems Engineering.
- Minimum five (5) years' experience code and script development.
- Minimum five (5) years' experience working with data analytics, or with data analytics platforms.
- Telecommunication industry experience.
- Experience with Network, Internet / Web technologies.
- Working knowledge of scripting languages (e.g. Python, Pearl, bash, etc.).