Senior MLOps Engineer (GPU Platform)

Products
We're hiring a Senior MLOps Engineer to own the reliability and scale of our GPU compute platform. Vecura runs 300+ scientific AI tools — protein structure prediction, molecular dynamics, docking, and more — across a wide range of GPU types, on both serverless and self host spanning cloud and on-prem. You'll own the platform layer between infrastructure and models: how GPU jobs are scheduled, queued, isolated, observed, and recovered. You think in SLOs and design for repeatability, building standard systems that scale across hundreds of models rather than one-off deployments. You'll work alongside our DevOps engineer (infra/cluster) and our AI engineers (model onboarding), owning the orchestration and reliability surface that connects them.
Senior MLOps Engineer (GPU Platform)
Ha Noi

-

Onsite

Onsite

Experienced

Competitive Salary

Bachelor Degree, College Degree

Share

Responsibilities

  • Own the success-run ratio of GPU workloads as a measurable SLO; drive it up and keep it there.
  • Build and operate the GPU job scheduling and queueing layer — fair-share allocation, prioritization, backpressure, and recovery across a heterogeneous fleet.
  • Implement GPU partitioning and sharing (MIG, MPS, time-slicing) to raise utilization without destabilizing runs.
  • Profile and right-size workloads: per-model GPU memory, runtime, and failure characteristics; eliminate OOMs and silent failures.
  • Define a standard packaging/deployment contract for new models so onboarding is repeatable, not bespoke.
  • Build observability for the run lifecycle — metrics, logs, traces, alerting — so failures are caught and diagnosed fast.
  • Harden the orchestration stack (workflow engine, durable execution, retries/failover) against real failure modes.
  • Partner with the DevOps engineer on cluster/networking and with AI engineers to make their models production-ready.

Qualifications

Must have:

  • 5+ years in MLOps / ML platform / GPU systems engineering, with direct ownership of production reliability.
  • Deep experience operating GPU workloads at scale (NVIDIA stack: CUDA, drivers, GPU Operator, MIG/MPS).
  • Strong background in workload orchestration and scheduling — Kubernetes (Jobs/batch), Ray, Slurm, or equivalent.
  • Hands-on managed-ML platform experience on at least one major cloud, with working familiarity of the other:

          - GCP — Cloud Run, Vertex AI

           - AWS — SageMaker

  • Solid understanding of cloud architecture (compute, networking, storage, IAM) across hybrid cloud + on-prem.
  • Proven track record raising reliability/utilization of a heterogeneous GPU fleet.
  • Solid software engineering (Python and one systems language) — you build platform tooling, not just configure it.
  • Observability and SRE fundamentals: SLOs, metrics, tracing, incident response.

Benefits

We provide a dynamic, fast-paced, and collaborative environment where problem-solving and agility are at the heart of what we do. Along with a competitive salary, we foster a culture that values ambition, confidence, and humility, consistently pushing the boundaries of innovation. If you're excited about working in a young, talented tech company and want to explore the world of AI and pharmaceuticals, we encourage you to apply.

  • Competitive salary (negotiable based on experience) 
  • Workplace: No.45-57, Tran Xuan Soan, Hai Ba Trung, Ha Noi (From Monday to Friday: 9h -17h) 
  • Build a professional network through collaborations with pharmaceutical companies, industry leaders, and academic experts.
  • Work on impactful projects that address critical challenges in drug discovery and healthcare.
  • Employees are entitled to 2 work-from-home days per month, along with daily lunch provided by the company.
  • Holiday & Tet bonuses; performance-based bonus
  • Social insurance contribution on full salary

How to Apply

If you think we're a good match, send your CV to:

Email: office@nyb.group

Subject: [NYB] Senior MLOps Engineer_Your name

We’ll get in touch to let you know what the next steps are.

Contact office@nyb.group for more information.

Share
Senior MLOps Engineer (GPU Platform)
Ha Noi

-

Onsite

Full time

Experienced

Competitive Salary

Bachelor Degree, College Degree

Share