Principal AIOps Engineer, Enterprise AI Platform

11 Jun 2025
Apply

Your CareerAs a Principal AIOps Engineer for the Enterprise AI Platform, you will be a pivotal technical leader responsible for designing, developing, and implementing AI-driven solutions to enhance the reliability, performance, and efficiency of our critical IT and business systems. You will leverage the core AI platform to build sophisticated AIOps capabilities, transforming how we monitor, manage, and optimize our digital infrastructure and applications. This role requires a deep understanding of IT operations, machine learning, and scalable system design to proactively identify issues, automate remediation, and drive continuous improvement across the enterprise.Your ImpactAIOps Platform Development: Design, develop, and implement advanced AIOps solutions, leveraging machine learning algorithms and data analytics to automate and enhance IT operations. This includes developing real-time processing solutions for observational data (e.g., logs, metrics, events, traces).Anomaly Detection & Predictive Analytics: Lead the implementation of AI/ML models for proactive anomaly detection, root cause analysis, and predictive insights into system health and performance across applications and infrastructure at enterprise scale.Intelligent Automation & Orchestration: Drive the automation of routine operational tasks, incident response, and remediation workflows using AI-driven agents and orchestration tools, minimizing manual intervention and improving operational efficiency.Observability & Data Integration: Collaborate with observability teams to ensure the efficient collection, processing, and transformation of high-volume, cross-domain data from diverse sources (events, logs, metrics, tickets, monitoring tools) into actionable intelligence for the AIOps platform.Incident Management & Remediation: Integrate AIOps insights with existing incident management systems, providing real-time intelligence to rapidly identify, diagnose, and resolve IT issues, leading to proactive issue resolution and reduced mean time to recovery (MTTR).Performance Optimization: Utilize AI insights to continuously monitor, analyze, and fine-tune IT systems for peak operational efficiency, capacity planning, and resource optimization.Technical Leadership & Mentorship: Provide technical leadership and mentorship to other engineers, promoting architectural excellence, innovation, and best practices in AIOps development and operations.Cross-Functional Collaboration: Partner with data scientists, ML engineers, software engineers, SREs, and IT operations teams to integrate AI/ML agents into the platform and ensure AIOps solutions align with business needs and deliver measurable ROI.Innovation & Research: Actively research and evaluate emerging AIOps technologies, generative AI, LLM models, ChatOps AI, and advanced RAGs, bringing promising innovations into production through POCs and long-term architectural evolution.

  • ID: #53987325
  • State: California Santaclara 95050 Santaclara USA
  • City: Santaclara
  • Salary: USD TBD TBD
  • Job type: Full-time
  • Showed: 2025-06-11
  • Deadline: 2025-08-10
  • Category: Et cetera
Apply