[Remote] Senior ML Evaluation Engineer - Autonomous Vehicles

Job summary

United States
Software Developer

Work model

Fully remote
Only United States
2 days ago
Job description

Job Summary

NVIDIA, a leader in AI and Autonomous Vehicles, is seeking a [Remote] Senior ML Evaluation Engineer for its US-based remote team. This role focuses on designing and implementing advanced evaluation pipelines for autonomous driving systems, moving beyond traditional rule-based methods to leverage cutting-edge machine learning techniques.

Responsibilities

  • Design and build learned evaluation pipelines utilizing LLMs, VLMs, and multimodal models to assess driving behavior.
  • Develop agentic workflows that integrate model inference, retrieval, and structured reasoning for evaluating complex driving scenarios.
  • Define and implement an "evaluation-of-evaluation" methodology to ensure the accuracy of learned evaluators.
  • Construct golden-set frameworks and calibration loops for learned metrics.
  • Collaborate with AML teams to address model-specific evaluation needs, such as COT prediction quality and AML regression coverage.
  • Instrument evaluation systems with robust experiment tracking, A/B comparison tools, and model versioning.
  • Drive the team's transition from rule-based to learned evaluation by identifying suitable metrics and analyzers for ML replacement and developing the necessary alternatives.

Skills and Qualifications

  • Education & Experience: PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
  • ML Pipeline Experience: Hands-on experience building LLM/VLM-based pipelines, including fine-tuning, prompt engineering, retrieval-augmented generation, and chain-of-thought.
  • Production ML: Proven track record of shipping ML systems to production environments.
  • Software Engineering: Strong fundamentals in Python and C++, with experience writing clean, tested, and reviewable code.
  • Evaluation Methodology: Experience with precision/recall, inter-rater reliability, calibration, and annotation pipelines.
  • Data Processing: Comfort with large-scale data processing using tools like Spark or Dask.
  • Programming: Strong Python skills and experience with PyTorch or JAX.
  • Training Workflows: Familiarity with GPU-based training workflows.
  • Domain Knowledge: Experience in autonomous driving, robotics, or safety-critical domains is a plus.
  • Driving Behavior: Familiarity with driving behavior taxonomies (e.g., cut-ins, hard braking, lane-keeping metrics, scenario-based evaluation).
  • Multimodal Understanding: Experience with video understanding models or multimodal evaluation.
  • Agentic AI: Knowledge of agentic AI frameworks such as LangChain, DSPy, CrewAI, or custom solutions.
  • Leadership: Demonstrated ability to influence technical direction across teams.
  • LLM/VLM Development: Experience with LLM/VLM fine-tuning or application development.

Benefits

  • Equity
  • Comprehensive Benefits Package

Company Overview

NVIDIA is a global computing platform company at the forefront of graphics, HPC, and AI. Founded in 1993 and headquartered in Santa Clara, California, NVIDIA employs over 10,000 people worldwide. Learn more at https://www.nvidia.com.

H1B Sponsorship

NVIDIA has a history of providing H1B sponsorships. Please note that sponsorship is not guaranteed for every role.

Location

  • Remote (USA)