SF Bay Area Software Jobs and IT Jobs

Software Engineer

Location: Peninsula (Palo Alto) posted: 04.02.26

EMPLOYER: MongoDB, Inc.

Job ID: 9509431

Salary Range: $198,000 – $257,000/year

TITLE:Software Engineer

Job Description: Design, implement, and maintain highly scalable, low-latency backend systems to serve and infer upon AI models using Python and other programming languages (such as Go, C++, or Rust), ensuring millisecond-level latency while handling tens of thousands to millions of requests per second. Engineer complex distributed systems that operate on datasets with billions of records, integrating advanced GPU autoscaling algorithms to dynamically allocate GPU resources for AI workloads and employing sophisticated load balancing strategies to optimize throughput and minimize latency. Deploy and manage applications seamlessly across multiple cloud environments (AWS, GCP, and Azure), utilizing Docker, Kubernetes, and Helm for containerization and orchestration. Implement robust CI/CD pipelines and employ observability tools like Prometheus and Grafana to continuously monitor performance, reliability, and resource utilization of large-scale, production-grade inference platforms. Must appear in office 3 days per week; WFH permissible 2 days per week.

Requirements: Master’s degree or foreign degree equivalent in Computer Science, or related field and two (2) years of experience in the job offered or related role.

Experience and/or education must include:

2 years of experience with Python specifically applied to building large-scale distributed backend systems handling tens of thousands to millions of requests per second and maintaining millisecond-level latency for AI model inference;
2 years of experience with Docker and Kubernetes, including advanced creation and configuration of Helm charts to deploy and manage large-scale, GPU-accelerated inference servers in multi-cloud environments;
2 years of experience with Linux systems and multi-cloud infrastructures (AWS, GCP, and Azure), including expertise in provisioning and scaling resources across multiple regions and platforms to ensure consistent, low-latency AI service delivery;
2 years of experience implementing and optimizing distributed scheduling algorithms, including GPU autoscaling logic to dynamically allocate compute resources, address race conditions, mitigate deadlocks, and ensure multi-server consistency in high-throughput AI inference pipelines;
2 years of experience designing and maintaining gRPC and RESTful APIs, ensuring efficient, secure, and backward-compatible service contracts that meet strict latency and availability requirements at scale;
2 years of experience with streaming and messaging platforms including Kafka and RabbitMQ, architecting ingestion pipelines to handle billions of data points, enabling rapid data access and real-time model updates;
2 years of experience employing large-scale NoSQL and SQL data stores (DynamoDB or BigQuery) to manage, query, and analyze billions of records supporting AI models, ensuring optimal performance and cost-efficiency under sustained heavy load;
2 years of experience optimizing GPU-accelerated model inference using frameworks including PyTorch, CUDA, and TensorFlow, reducing inference time and improving throughput by tuning GPU kernel operations, optimizing memory transfers, and streamlining data pipelines for large-scale production; and
2 years of experience implementing production-grade feature stores and batch job scheduling frameworks, creating reliable feature retrieval endpoints and orchestrating large-scale batch data processing tasks to support continuous improvement of machine learning model inputs.

JOB SITE: 499 Hamilton Avenue, Palo Alto, CA 94301; Must appear in office 3 days per week; WFH permissible 2 days per week.

CONTACT: Please email resume to Apply-Careers@mongodb.com and reference Job ID 9509431

Report Problem