We are growing fast - and we invite you to grow with us. In Innowise, you can not only develop as an expert in your field, solve complex problems and influence the result, but also see how the finished project affects the world around. We are a close-knit team of professionals who have already implemented 1600+ cases for clients from the USA, Denmark, Germany, etc. We need someone, who will strengthen our team and become part of the community!

You need to have a proven experience in followong:

Infrastructure & IaC: Minimum 250 person-days managing server/cloud infrastructure (Public/Private), IaC tools (Terraform, Bicep, etc.), Docker, and Kubernetes
CI/CD: Minimum 300 person-days designing and maintaining CI/CD solutions in production environments
MLOps / ALM: Minimum 200 person-days deploying and utilizing MLflow, Kubeflow, ClearML, or similar platforms
Incident Management: Minimum 150 person-days in root cause analysis and stabilizing critical systems
Strong Linux systems engineering background (RHEL/Rocky/SLES)
Proficiency in Python and Bash for automation
Experience with relational databases (PostgreSQL), NFS, and Object stores (S3-compatible)
Experience with Data Analytics & Data Analysis (working with Databricks)
Strong analytical mindset with an implementation-oriented approach
Ability to translate business requirements into scalable technical solutions
Excellent cross-functional collaboration (with Data Scientists, Developers, and IT operations)

Will be a plus:

Deep experience with HPC schedulers (PBS Professional, Torque, Slurm) and building integrations (hooks, prolog/epilog scripts)
Experience bridging traditional HPC schedulers with modern cloud-native platforms (Kubernetes, MLOps stacks) and configuring dynamic scaling (cloud bursting)
Scripting abilities in Go or Rust
Proficiency in SQL and PowerShell
Familiarity with MPI workloads (OpenMPI, MPICH) and GPU scheduling (NVIDIA stack, MIG/MPS)
Experience with parallel file systems (Lustre strongly preferred)
Configuration management experience (Ansible, Puppet, or similar)

Key Responsibilities:

Design, deploy, and support resilient infrastructure for machine learning platforms and data pipelines using Python and SQL
Implement Application Lifecycle Management (ALM) for machine learning, automating training, versioning, and deployment processes (MLflow, Kubeflow, ClearML, or similar enterprise ML platforms)
Ensure reliability, scalability, and high availability of the MLOps infrastructure and backend services
Design and manage distributed compute environments (bare metal, VM, private/public cloud)
Containerize ML services and applications using Docker and Kubernetes, orchestrating smooth production rollouts
Automate infrastructure provisioning, cluster lifecycle, and configuration using Infrastructure as Code (Terraform, Bicep, ARM, etc.)
Build, integrate, and maintain robust CI/CD pipelines (GitLab CI, GitHub Actions, Jenkins)
Implement comprehensive observability (logging, metrics, dashboards) for overall cluster health
Diagnose bottlenecks, resolve node/network failures, and conduct Root Cause Analysis (RCA) as part of proactive incident management

We offer

Flexible work schedule

Experience of working with clients all over the world

Financial assistance

MLOps / DevOps Engineer

PolandSenior

You need to have a proven experience in followong:

Will be a plus:

Key Responsibilities:

We offer

Want to join the team?

Email us

Related opportunities

DevOps Engineer

All locations

Middle/Senior

Any questions about the job?