Network Reliability Engineer

15 Aug 2024

Vacancy expired!

Job Description:

The Network Reliability Engineer (also known as SRE) role will partner directly with Software Engineering, Core Technology Infrastructure (CTI) Engineering, and Technology Services roles to define objective reliability goals for the services they support to gain operational visibility into meeting those goals through instrumentation, tooling, dashboards, and automation.

The area of focus for the Network Reliability Engineer role will be to increase service stability through automation, tools, and processes. This individual will be engaged in major production triage efforts and work with problem management in the identification of root cause of highly impactful or complex issues as required. This individual will use the knowledge gained in those efforts to partner closely with software developers, production services, architects, and Infrastructure teams to drive delivery of automated solutions to eliminate operational inefficiencies and improve stability.

This position will interface directly with internal stakeholders and external suppliers/providers, architecture, product engineering, product management, senior and business management. Strong communication and problem-solving skills are a must. The candidate must be able to work on their own and successfully in team settings in various sizes and locations. Adherence and use of standards, product sets, templates, systems, and artifacts are important to the success of the individual, the department, and the firm at large. The Network Reliability Engineer (also known as SRE) will be considered a subject matter expert in their field and is expected to stay current with various technologies, organizational goals, and industry trends to drive end to end value.

Key Responsibilities:
  • Collaborate with Engineering and Production Services teams to understand technical solutions and define strategies for network automation to reduce operational inefficiencies
  • Develop and maintain a catalog of reliability scripts, tools and libraries that can be leveraged for common instrumentation, automation, and operational needs to identify and remediate network events
  • Provide next level escalation support for production triage efforts
  • Manage a continuous improvement / continuous development (CI/CD) pipeline for network development and testing
  • Participate in the documentation of application/network flows for various support needs
  • Provide technical guidance and mentorship to junior members of the team

Required Skills:
  • Expert in networking principles and protocols
  • Experience in software development supporting production networks
  • Experience with automation tools such as Python, Ansible, YAML or Django, API calls (to ticketing systems and network devices), and frontend web development
  • Experience with Linux/Unix and system management
  • Experience with observability data platforms (Cribl)
  • Experience with message buses (e.g. Apache Kafka and NiFi)
  • Understanding of Git workflows, continuous improvement / continuous development (CI/CD) concepts and how they can be applied to network automation and a testing framework
  • Experience with JIRA and Confluence
  • Understand configuration management with tools such as Forward Networks and HPNA
  • Knowledge and experience using (both proactive and reactive) advanced tooling. Inclusive of but not limited to NetScout, Wireshark, Splunk, SevOne, HPNA, NNMI, OBM, IBM Watson, etc.
  • Experience with Agile and Lean philosophies
  • Strong verbal and written communication skills and ability to work with all levels ofmanagement
  • The ability to provide leadership skills (self-starter / self-motivated)
  • Organized and detail oriented
  • Strong technical acumen
  • Strong analytical skills
  • Experience operating with colleagues across different time zones with a flexible approach to working hours to successfully interact and communicate on a global level

Desired Skills:
  • Experience in Networking-related disciplines within a design, implementation, or operations role
  • Relevant Industry certifications in Network Technologies
  • Experience working within Financial services (Insurance, Banking, Investment banking)
  • Experience with network vendors such as Cisco, Arista, F5, VMware, McAfee, Bluecat, Aruba

Shift:1st shift (United States of America)

Hours Per Week:40

Learn more about this role

  • ID: #44890192
  • State: Texas Richardson 75080 Richardson USA
  • City: Richardson
  • Salary: USD TBD TBD
  • Job type: Permanent
  • Showed: 2022-08-15
  • Deadline: 2022-10-13
  • Category: Et cetera