Site Reliability Engineer job vacancy

Vacancy expired!

Job Title : Site Reliability Engineer (SRE)

Location : Pleasanton, CA (Remote till Covid)

Duration : Long term contract

Roles & Responsibilities

SRE hands on good experience
Web base application (java) support experience
APM Tool experience

Application Production Support exposure

Experience with application operation, cloud platform, system uptimes, system recovery, performance, Latency, monitoring, and root cause analysis.
4-6 + year experience as automation and tooling engineer.
Solid knowledge and experience of scripting (Python / Bash) for java/NodeJS runtime environment.
Deep understanding and experience of microservices, API and Web Services.
Strong hands-on experience developing applications using Java, NodeJS / AngularJS, Python, GO, etc.
Experience with cloud native applications, docker, Kubernetes, etc.
Experience writing clean, modular Typescript code using external libraries or custom code.
Experience with CI\CD pipeline using Jenkins and Github.
Good to have experience with tools such as BlueTriangle, writing splunk query, and monitoring tool such as Dynatrace.
Excellent verbal and written communication skills.
Prior experience in supporting web and mobile apps
Basic knowledge of CDN (Akamai)
Exposure to Monitoring tools (APM, Synthetic & Log Monitoring etc.)
Azure exposure (any cloud)
Unix & Scripting for Automation
eCommerce experience (supporting web applications etc)

Roles & Responsibilities

Responsible for Toil Reduction, implementing identified improvement opportunities, handling minor enhancement and non-ticketed activity.
Prior experience in supporting web and mobile apps
Basic knowledge of CDN (Akamai)
Exposure to Monitoring tools (APM, Synthetic & Log Monitoring etc.)
Azure exposure (any cloud)
Unix & Scripting for Automation
eCommerce experience ( supporting web applications etc)
Define and monitor service level metrics that include incident management KPIs like: MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc.
Create rules to optimize incident response by metrics, streamlining alert flows, and collaboration and communication across squads.
Proactively identify the issues that might disrupt the service in production
Address incoming service request to their support groups/Jira tool
Create and maintain alerts
Change validation or change planning related requests
Assist business stake holder in determining SLO or adjusting threshold limits
Demand and capacity management & make corrections to SLI/SLO threshold limits
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objective (SLO, SLI)
Debug production issues across services and levels of the stack.

ID: #43687038
State: California Pleasanton 94566 Pleasanton USA
City: Pleasanton
Salary: Depends on Experience
Job type: Contract
Showed: 2022-06-29
Deadline: 2022-08-27
Category: Et cetera