Vacancy expired!
Role: Reliability Engineer Location: New York, NY (Onsite from day 1) Duration: Long Term Experience: 7+ Years Job Description Responsibilities:
- You will need to spend 50% of your time on and around production support, including the handling of user tickets, incidents and problem management
- You will identify and create automation to eliminate manual day to day support activities; scope and create automation for deployment, management and visibility of our services.
- Automate to drive efficiency by designing an autonomous system
- Manage Service reliability by managing risk
- Define service level indicators (SLIs), objectives (SLOs), and agreements (SLAs).
- Implement best practices for building successful monitoring and alerting systems
- You will use your expertise to tune and push our systems beyond their normal limit.
- You will work closely with engineering/development teams to design, build, and maintain systems and help them decide on products to use, schema design and query tuning.
- You will troubleshoot issues across the entire stack: hardware, software, application and network.
- You will mentor other SREs on standard methodology for everything from monitoring to troubleshooting complex code and database issues.
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
- Participate in on-call rotation and periodic conference calls with other specialists from other time zones.
- Bachelor's Degree/background in Computer Science
- Experience in software development: automation-related
- experience valued in particular. Scripting languages such as bash, python, ruby, or compiled languages such as C, C#, JAVA, Scala and Go are most relevant but others are acceptable.
- One higher level language is desired.
- Hands on experience using Enterprise Tools such as App Dynamic, Grafana, Splunk, Dynatrace
- Three Tier Support experience with DBs such as IBM, DB2, Sybase, Mongo, Green Plum, KDB
- Professional ownership of issues
- Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack; an understanding of how applications are affected by the above, and ability to debug same.
- Generally speaking, practical experience running large scale online systems is always an advantage.
- Awareness of, and ability to reason about modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
- Knowledge of messaging layer: MQ / CPS / XML
- Knowledge of SFTP/Comet
- ServiceNow
- Prior experience as a developer/support role in a large-scale financial firm
- ID: #42081734
- State: New York Newyorkcity 10001 Newyorkcity USA
- City: Newyorkcity
- Salary: USD TBD TBD
- Job type: Contract
- Showed: 2022-06-01
- Deadline: 2022-07-31
- Category: Et cetera