Vacancy expired!
Site Reliability Engineer Job Description
OVERVIEW:Founded in 1987, CoStar Group is the leading provider of commercial real estate information, analytics, and online marketplaces. Our suite of online services enables clients to analyze, interpret and gain unmatched insight on commercial property values, market conditions and current availability. Behind some of the most well-known brands in the industry, CoStar Group includes CoStar, the largest provider of CRE research and real-time data; LoopNet, the most heavily trafficked mobile and online real estate marketplace; Apartments.com, the premier rental home resource for renters, property managers and owners; STR, the leading provider of performance benchmarking and comparative analytics to the hotel industry; BizBuySell, the largest online marketplace for businesses-for-sales; and Lands of America, the leading operator of online marketplaces for rural real estate.Headquartered in Washington, DC, CoStar Group maintains offices throughout the U.S. and in Europe, Canada, and Asia with a staff of over 4,300 worldwide. This position is in our Washington, DC office and has the opportunity for hybrid work with up to two days remote per week. Four-day work weeks are also an option for those applicants that are interested. THE ROLE:A Site Reliability Engineer is responsible for improving the availability, performance, capacity, latency, and efficiency of the application through software engineering best practices.Your job is to make sure everything that goes into production is awesome. You will help set coding standards, perform code review, coach other team members on how to develop high-performing software, and spend time writing great code to solve complex problems. You will help with system design, architectural diagrams, and management of service dependencies. You will own the monitoring and logging systems and be responsible for the data recovery process and high availability mechanisms. You will have access to operational and analysis tools to help detect bottlenecks and performance issues throughout the enterprise. You will work closely with developers, DevOps, networking and security teams to achieve your goals.Site Reliability Engineers are involved in all aspects of the software development life cycle: Conceptual Design, Development, Testing, Deployment, and Production Ops. We have a flat hierarchy, and we spend much of our time writing code. We love agile, continuous integration, and continuous deployment.Our effectiveness will be measured as a function of mean time to recover (MTTR)and mean time to failure (MTTF). In other words, we must have our services up and running again as quickly as possible, and we must avoid any subsequent failure for as long as possible.RESPONSIBILITIES:- Scale our infrastructure (on-premise & in the cloud) to support our growing ecosystem
- Improve system performance, reliability, and maintainability
- Adhere to industry standard security best practices
- Write automation, monitoring, diagnostic, and debugging tools
- Bachelor's degree in Computer Science or related technical field.
- Experience with NodeJS, Docker and a least one of the following technologies: Kubernetes, OpenTelemetry, Prometheus.
- Proficiency in at least one programming language, preferably JavaScript/TypeScript.
- Experience designing systems/infrastructure and leading iterative development and deployment of new services and capabilities
- Experience debugging, profiling, optimizing code, and automating routine tasks.
- Self-motivated, systematic problem-solver, great communicator, have a sense of ownership and drive.
- Expertise in designing, analyzing, troubleshooting, and capacity planning large-scale distributed systems.
- Experience with the following: React/Angular, Node.js, SQL (any RDBS), Bash, PowerShell, TFS/Azure DevOps, & Git.
- Experience with operating systems (e.g., Windows/Linux Distros/Debian/Kali), administration (e.g., Filesystems, System RPC calls, etc) and networking (e.g., TCP/IP, routing, network topologies and hardware, etc.).
- Experience in one or more of the following: AWS, Google Cloud Platform, Python, Serverless, Docker, Container Orchestration (i.e. Kubernetes).
- System load testing and baseline measurement.
- Familiar with log management and analytics tools: OpenTelemetry, Jaeger, Elasticsearch, Kibana, Logstash, Grafana, Prometheus, DataDog, Splunk
- Comprehensive healthcare coverage: Medical / Vision / Dental / Prescription Drug
- Life, legal, and supplementary insurance
- Commuter and parking benefits
- 401(K) retirement plan with matching contributions
- Employee stock purchase plan
- Paid time off
- Paid parental leave (up to 12 weeks)
- Tuition reimbursement
- On-site fitness center and/or reimbursed fitness center membership costs (location dependent), with yoga studio, Pelotons, personal training, group exercise classes, as well as Segways and bikes available for use during the day
- Complimentary gourmet coffee, tea, hot chocolate, prepared foods, fresh fruit, and other healthy snacks
- ID: #43776471
- State: District of Columbia Washington 20001 Washington USA
- City: Washington
- Salary: USD TBD TBD
- Job type: Permanent
- Showed: 2022-07-02
- Deadline: 2022-08-30
- Category: Et cetera