Job Description: Collaborate with AI researchers and engineers from the Search Platform and Voyage .AI teams to productionize state-of-the-art embedding models and rerankers, enabling high-scale, low-latency inference for both real-time and batch workloads. Lead key projects focused on performance optimization, GPU utilization, autoscaling, and observability for the inference platform. Design and implement components of a multi-tenant inference platform, deeply integrated with Atlas Vector Search, to power semantic search, hybrid retrieval, and AI-native features for MongoDB customers at global scale. Build core platform capabilities, including model versioning, safe and automated deployment pipelines, latency-aware request routing, and model health monitoring—ensuring continuous delivery and system resilience. Make high-leverage architectural decisions and define long-term technical direction for the inference infrastructure, balancing performance, reliability, and developer ergonomics. Tools include vLLM, ONNX Runtime, and Kubernetes-based orchestration. Collaborate across engineering, infrastructure, ML, and product teams to define shared architectural patterns and operational best practices that support high availability and low-latency performance at scale. Influence strategic direction and planning, contributing to quarterly and annual roadmap development, evaluating trade-offs, and helping leadership balance short-term execution with long-term goals. Must appear in office 3 days per week; WFH permissible 2 days per week.
Requirements: Master’s degree or foreign degree equivalent in Computer Science or related field and 5 years of experience in ML inference serving and optimizations or in the job offered or a related role
Experience and/or education must include:
5 years of experience designing and developing large-scale distributed systems in production, including microservices architectures supporting tens of thousands to millions of requests per second;
Programming languages including Python, Go, and Java, with emphasis on developing high-performance, reliable, and maintainable systems, backend infrastructure, ML platforms, distributed systems, and systems-level optimization;
5 years of experience operating and managing Linux-based systems across cloud-native environments (AWS, GCP) including experience with infrastructure-as-code using Terraform, container orchestration with Kubernetes, multi-region deployment strategies, and high-availability service delivery at scale;
5 years of experience designing and maintaining high-throughput data ingestion and pipelines using Kafka or Pub/Sub, capable of reliably processing tens of millions of events per day;
5 years of experience building and optimizing large-scale data infrastructure using BigQuery, Presto, or Spark, including expertise in distributed schema design, analytical query optimization, and cost-performance tradeoffs in production;
1 year of experience developing machine learning infrastructure, including distributed feature stores, real-time and batch feature retrieval systems, and scalable model serving platforms and
1 year of hands-on experience with large language models (LLMs) using frameworks such as PyTorch and LlamaFactory, including fine-tuning and model deployment.
JOB SITE: 499 Hamilton Avenue Palo Alto, CA 94301; Must appear in office 3 days per week; WFH permissible 2 days per week.
CONTACT: Please email resume to Apply-Careers@mongodb.com and reference Job ID 9670965
EMPLOYER: MongoDB, Inc.
Job ID: 9670965
Salary Range: $270,000/yr. - $351,000/yr.
TITLE: Staff Software Engineer
Job Description: Collaborate with AI researchers and engineers from the Search Platform and Voyage .AI teams to productionize state-of-the-art embedding models and rerankers, enabling high-scale, low-latency inference for both real-time and batch workloads. Lead key projects focused on performance optimization, GPU utilization, autoscaling, and observability for the inference platform. Design and implement components of a multi-tenant inference platform, deeply integrated with Atlas Vector Search, to power semantic search, hybrid retrieval, and AI-native features for MongoDB customers at global scale. Build core platform capabilities, including model versioning, safe and automated deployment pipelines, latency-aware request routing, and model health monitoring—ensuring continuous delivery and system resilience. Make high-leverage architectural decisions and define long-term technical direction for the inference infrastructure, balancing performance, reliability, and developer ergonomics. Tools include vLLM, ONNX Runtime, and Kubernetes-based orchestration. Collaborate across engineering, infrastructure, ML, and product teams to define shared architectural patterns and operational best practices that support high availability and low-latency performance at scale. Influence strategic direction and planning, contributing to quarterly and annual roadmap development, evaluating trade-offs, and helping leadership balance short-term execution with long-term goals. Must appear in office 3 days per week; WFH permissible 2 days per week.
Requirements: Master’s degree or foreign degree equivalent in Computer Science or related field and 5 years of experience in ML inference serving and optimizations or in the job offered or a related role
Experience and/or education must include:
5 years of experience designing and developing large-scale distributed systems in production, including microservices architectures supporting tens of thousands to millions of requests per second;
Programming languages including Python, Go, and Java, with emphasis on developing high-performance, reliable, and maintainable systems, backend infrastructure, ML platforms, distributed systems, and systems-level optimization;
5 years of experience operating and managing Linux-based systems across cloud-native environments (AWS, GCP) including experience with infrastructure-as-code using Terraform, container orchestration with Kubernetes, multi-region deployment strategies, and high-availability service delivery at scale;
5 years of experience designing and maintaining high-throughput data ingestion and pipelines using Kafka or Pub/Sub, capable of reliably processing tens of millions of events per day;
5 years of experience building and optimizing large-scale data infrastructure using BigQuery, Presto, or Spark, including expertise in distributed schema design, analytical query optimization, and cost-performance tradeoffs in production;
1 year of experience developing machine learning infrastructure, including distributed feature stores, real-time and batch feature retrieval systems, and scalable model serving platforms and
1 year of hands-on experience with large language models (LLMs) using frameworks such as PyTorch and LlamaFactory, including fine-tuning and model deployment.
JOB SITE: 499 Hamilton Avenue Palo Alto, CA 94301; Must appear in office 3 days per week; WFH permissible 2 days per week.
CONTACT: Please email resume to Apply-Careers@mongodb.com and reference Job ID 9670965