ML Ops Engineer

Bangalore
Full-Time
Mid-Level: 3 to 7 years
Posted on Mar 21 2025

About the Job

Skills

Python
Jenkins
AWS
Docker
Kubernetes
GitLab CI/CD
Prometheus
Terraform

Job Summary:


We are seeking a skilled ML Ops Engineer to support and enhance our machine learning operations infrastructure. In this role, you will be responsible for monitoring production services, troubleshooting issues, and collaborating with teams to improve automation and system reliability. You will play a critical role in ensuring seamless model deployment, performance, and integration within our ML platform


Key Responsibilities:


  • Monitor support channels and incident queues to proactively identify and address operational issues.
  • Investigate and resolve issues reported by automation systems, alerts, or customer feedback.
  • Maintain and support online production services for serving ML models, ensuring high availability and performance.
  • Collaborate with engineering teams to automate processes and improve operational efficiency.
  • Gain a deep understanding of ML platform capabilities and integrations, providing technical insights to enhance system reliability.
  • Identify recurring issues and provide feedback to ML platform engineers for continuous improvements.
  • Contribute to documentation efforts, ensuring clarity and accuracy for internal teams and stakeholders.


Required Skills & Qualifications:


  • Bachelor's or master's degree in computer science or related field.
  • Relevant experience of 3 years in Python programming.
  • Programming: Proficiency in Python (Mandatory).
  • ML Infrastructure: Hands-on experience with Databricks, Tecton, and ML Concepts (Model Deployment, Feature Engineering, Monitoring).
  • DevOps & Automation: Strong knowledge of Kubernetes, Jenkins, and GitHub for CI/CD pipelines and infrastructure automation.
  • Cloud Computing: Expertise in AWS services related to ML Ops.
  • Version Control & Monitoring: Experience with GitHub Actions, observability tools, and system monitoring frameworks.
  • Problem-Solving & Communication: Strong analytical skills, ability to debug production issues, and effectively communicate with cross-functional teams.


Preferred Qualifications:


  • Experience working with large-scale distributed ML systems.
  • Knowledge of Terraform for infrastructure as code.
  • Experience with logging and observability best practices for ML models in production.
  • If you are excited about building scalable ML infrastructure and driving automation, we’d love to hear from you! Apply now to be part of our team.


About the company

Saarthee is global IT consulting firm unlike any other, where our passion for helping others fuels our approach and our products and solutions. We are a one-stop shop for all things data and analytics. Unlike other analytics consulting firms that are technology or platform specific, Saarthee’s holistic and tool agnostic approach is unique in the marketplace. Our Consulting Value Chain framework me ...Show More

Industry

Management Consulting

Company Size

51-200 Employees

Headquarter

Philadelphia, USA

Other open jobs from Saarthee