We have an open role for Position Title-ML Ops Engineer– with a leading Group in Bahrain.
Job Title: ML Ops Engineer
Location: Bahrain
Experience: 5-7 Years
*** Kindly share CVs to Design and implement data pipelines and engineering infrastructure to support enterprise machine learning systems at scale. Work closely with data scientists and engineering teams to deploy, monitor, and optimize machine learning models in production. Identify, evaluate, and integrate new technologies to enhance performance, maintainability, and reliability of machine learning solutions. Apply software engineering best practices to machine learning pipelines, including CI/CD, automation, monitoring, and version control. Manage cloud infrastructure (AWS, Azure, GCP) and containerization (Docker, Kubernetes) to ensure scalable and efficient ML workloads. Implement and maintain highly available and scalable machine learning environments. Ensure the security and compliance of machine learning systems, adhering to governance and industry regulations. Troubleshoot and optimize machine learning models and infrastructure for performance improvements. Collaborate with IT and OT teams to ensure seamless integration of machine learning systems. Use Infrastructure as Code (Terraform, CloudFormation) to automate the management and provisioning of infrastructure. Implement automated processes for deployment, monitoring, logging, and performance tracking.
Required Skillsets:
- ML Model Deployment & Containerization: Strong experience with Docker and Kubernetes.
- Cloud Platforms: Expertise in AWS, Azure, or Google Cloud Platform (GCP).
- DevOps Practices: In-depth knowledge of DevOps, CI/CD pipelines, and automation techniques.
- Monitoring & Logging: Proficiency in setting up monitoring and logging for ML models and infrastructure.
- Version Control: Expertise in Git or other version control systems.
- IT-OT Integration: Experience integrating IT and OT systems.
- Scalability & High Availability: Proven track record of designing scalable, highly available machine learning infrastructure.
- Security & Compliance: Understanding of security protocols, compliance frameworks, and governance.
- Infrastructure as Code (IaC): Proficiency with Terraform or CloudFormation for automating infrastructure management.
- Scripting: Strong skills in Python or Bash scripting for automation.
- Data Engineering: Familiarity with data engineering workflows and handling large datasets.
- Troubleshooting: Excellent problem-solving and troubleshooting abilities in distributed systems.
Job Types: Full-time, Permanent