Manager, Site Reliability Engineering

25 May 2024

Vacancy expired!

Our Direct-to-Consumer (DTC) portfolio is a powerhouse collection of consumer-first brands, supported by media industry leaders, Comcast, NBCUniversal and Sky. When you join our team, you’ll work across our dynamic portfolio including Peacock, NOW, Fandango, SkyShowtime, Showmax, and TV Everywhere, powering streaming across more than 70 countries globally. And the evolution doesn’t stop there. With unequalled scale, our teams make the most out of every opportunity to collaborate and learn from one another. We’re always looking for ways to innovate faster, accelerate our growth and consistently offer the very best in consumer experience. But most of all, we’re backed by a culture of respect. We embrace authenticity and inspire people to thrive.     This ambition is a group effort. As challengers at heart, our secret weapon is our talented team of big thinkers, data-driven drivers of growth and innovation. We start by putting people first, embracing empathy and compassion to create a more dynamic, more fulfilling workplace and a better, more enjoyable product. As a company, we embrace the power of transparency and inclusion. We know the best idea can come from anywhere, so we’re committed to creating an organization where we act as one and put ego aside. We are determined to forge the next frontier of streaming through trust, teamwork, and talent.  The Site Reliability Engineering Manager will lead a team of Site Reliability Engineers as part of our greater SRE organization. The SRE manager will provide hands-on technical leadership and be responsible for maintaining the cloud systems utilized to operate NBC’s Direct-to-Consumer platforms. The SRE manager will provide a software-driven approach to operations, managing infrastructure as code, leveraging deployment pipelines, with a focus on automation, observability, and resiliency.

 

ResponsibilitiesLead a team of Site Reliability Engineers, following agile methodologiesProvide technical consultation to, and collaborate with product delivery teamsManage requirements gathering and task prioritizationCollaborate with Site Reliability Engineers and Software Delivery teams to define and implement software deployments, monitoring, and infrastructure requirementsEnsure platforms are highly available, resilient, fault tolerant, performant, and observablePromote SRE and DevOps principles, including automation and self-serviceEnsure Service Level Objectives and Service Level Indicators are defined and measuredInfrastructure provisioning and managementLead development of custom software for automation, observability, or other requirementsDevelop methodologies to safely deploy and test network and infrastructure changes, including customized tests and chaos engineeringEnsure operational documentation, wikis, and readmes are maintainedTroubleshooting and problem solvingParticipate in, and lead code reviewsMentor, and provide feedback to engineersProvide support for operations and delivery teams to remediate production issues as appropriateBuild cloud-agnostic solutions that can be quickly deployed against a wide variety of cloud computing providersBuild an effective and efficient remote-working team that spans across different time zonesManage a 24/7 on-call rotation 

  • ID: #49995158
  • State: New Hampshire Newyork 00000 Newyork USA
  • City: Newyork
  • Salary: USD TBD TBD
  • Job type: Full-time
  • Showed: 2023-05-25
  • Deadline: 2023-07-24
  • Category: Et cetera