[Remote] Senior Software Engineer, Distributed Systems - NIM Factory

Job summary

United States
Software Developer

Work model

Fully remote
Only United States
2 days ago
Job description

NVIDIA: Remote Senior Software Engineer, Distributed Systems - NIM Factory

NVIDIA is seeking a senior engineer to design and build factory infrastructure and automation for NVIDIA Inference Microservices (NIMs). This remote role, open to candidates in the USA, involves developing a factory pipeline for AI models, collaborating with teams, and mentoring colleagues to improve productivity and performance.

Responsibilities

  • Develop a factory pipeline that will take an AI model in and produce a deployable service that is validated across Cloud, On-prem and Kubernetes environments
  • With the team, define and deliver rapid iterations on the group's technical strategies and roadmaps to deliver and improve the NIM factory
  • Design interfaces, data modeling and schema design, and expand observability over the factory pipeline and its compute infrastructure
  • Work with technical leaders designing and developing scalable and reliable factory components
  • Collaborate with multiple AI model teams to understand their requirements to build an efficient infrastructure that improves every team's productivity
  • Define metrics and drive improvements based on user feedback
  • Mentor and collaborate throughout the team and with other teams to grow your colleagues and yourself
  • Demonstrate a history of learning and growing your skills and those around you

Skills

  • Advanced programming skills to build distributed and compute systems, backend services, microservices, and cloud technologies
  • Experience working with multi-functional teams, principals, and architects across organizational boundaries
  • Mentorship, growing teams and team members, and flexibility to adjust direction and expectations based on customer needs
  • Deep technical expertise in distributed containerized applications using technologies such as Docker, K8s, Cloud Endpoints, Helm, and Prometheus
  • Passion for building rich, microservice applications and build/test automation pipelines
  • Excellent interpersonal skills and the ability to lead multi-functional efforts
  • Proven experience debugging and analyzing the performance of distributed microservices or cloud systems
  • BS or MS in Computer Science, Computer Engineering, or related field (or equivalent experience)
  • 8+ years of demonstrated experience developing performant microservice, cloud software, and/or tooling
  • Experience delivering event-driven applications using services like Temporal, Kafka, Redis, or others, with a demonstrable ability to discuss their pros and cons
  • Experience building and deploying containers for Microservices, Cloud, and On-prem deployments, and their associated CI/CD pipelines
  • Prior experience in large-scale full-stack development

Benefits

  • Equity
  • Benefits

Company Overview

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993 and headquartered in Santa Clara, California, USA, NVIDIA has a workforce of 10001+ employees. Visit https://www.nvidia.com.

Company H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role.