Senior MLOps Engineer (GPU Platform)

Products

We're hiring a Senior MLOps Engineer to own the reliability and scale of our GPU compute platform. Vecura runs 300+ scientific AI tools — protein structure prediction, molecular dynamics, docking, and more — across a wide range of GPU types, on both serverless and self host spanning cloud and on-prem. You'll own the platform layer between infrastructure and models: how GPU jobs are scheduled, queued, isolated, observed, and recovered. You think in SLOs and design for repeatability, building standard systems that scale across hundreds of models rather than one-off deployments. You'll work alongside our DevOps engineer (infra/cluster) and our AI engineers (model onboarding), owning the orchestration and reliability surface that connects them.

Ha Noi

Onsite

Experienced

Competitive Salary

Bachelor Degree, College Degree

Email Us

Own the success-run ratio of GPU workloads as a measurable SLO; drive it up and keep it there.
Build and operate the GPU job scheduling and queueing layer — fair-share allocation, prioritization, backpressure, and recovery across a heterogeneous fleet.
Implement GPU partitioning and sharing (MIG, MPS, time-slicing) to raise utilization without destabilizing runs.
Profile and right-size workloads: per-model GPU memory, runtime, and failure characteristics; eliminate OOMs and silent failures.
Define a standard packaging/deployment contract for new models so onboarding is repeatable, not bespoke.
Build observability for the run lifecycle — metrics, logs, traces, alerting — so failures are caught and diagnosed fast.
Harden the orchestration stack (workflow engine, durable execution, retries/failover) against real failure modes.
Partner with the DevOps engineer on cluster/networking and with AI engineers to make their models production-ready.

Qualifications

Must have:

5+ years in MLOps / ML platform / GPU systems engineering, with direct ownership of production reliability.
Deep experience operating GPU workloads at scale (NVIDIA stack: CUDA, drivers, GPU Operator, MIG/MPS).
Strong background in workload orchestration and scheduling — Kubernetes (Jobs/batch), Ray, Slurm, or equivalent.
Hands-on managed-ML platform experience on at least one major cloud, with working familiarity of the other:

- GCP — Cloud Run, Vertex AI

- AWS — SageMaker

Solid understanding of cloud architecture (compute, networking, storage, IAM) across hybrid cloud + on-prem.
Proven track record raising reliability/utilization of a heterogeneous GPU fleet.
Solid software engineering (Python and one systems language) — you build platform tooling, not just configure it.
Observability and SRE fundamentals: SLOs, metrics, tracing, incident response.

‍

Benefits

We provide a dynamic, fast-paced, and collaborative environment where problem-solving and agility are at the heart of what we do. Along with a competitive salary, we foster a culture that values ambition, confidence, and humility, consistently pushing the boundaries of innovation. If you're excited about working in a young, talented tech company and want to explore the world of AI and pharmaceuticals, we encourage you to apply.

Competitive salary (negotiable based on experience)
Workplace: No.45-57, Tran Xuan Soan, Hai Ba Trung, Ha Noi (From Monday to Friday: 9h -17h)
Build a professional network through collaborations with pharmaceutical companies, industry leaders, and academic experts.
Work on impactful projects that address critical challenges in drug discovery and healthcare.
Employees are entitled to 2 work-from-home days per month, along with daily lunch provided by the company.
Holiday & Tet bonuses; performance-based bonus
Social insurance contribution on full salary

How to Apply

If you think we're a good match, send your CV to:

Email: office@nyb.group

‍Subject: [NYB] Senior MLOps Engineer_Your name‍

We’ll get in touch to let you know what the next steps are.

Contact office@nyb.group for more information.

Senior MLOps Engineer (GPU Platform)

Ha Noi

Onsite

Full time

Experienced

Competitive Salary

Bachelor Degree, College Degree

Email Us

Senior MLOps Engineer (GPU Platform)

Responsibilities

Qualifications

Must have:

Benefits

How to Apply