[Remote] Staff Backend Software Engineer: Inference

Job summary

United States
Software Developer

Work model

Fully remote
Only United States
5 days ago
Job description

About Archetype AI

Archetype AI develops Physical AI agents that harness real-world sensor data to enhance decision-making and automate processes. Founded in 2023 and headquartered in Palo Alto, California, USA, Archetype AI has a workforce of 11-50 employees. Its website is https://www.archetypeai.io.

Archetype AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role.

Job Overview

Archetype AI is seeking a highly motivated backend engineer to design and develop performant, scalable, and resilient inference services. This is a remote role open to candidates in the USA. You will work closely with researchers and product teams to bring AI capabilities into production.

Responsibilities

  • Architect, implement, and maintain distributed inference serving systems that support high-throughput, low-latency model serving across multiple AI accelerator families and cloud platforms.
  • Enable breakthrough research by providing scientists with high-performance inference infrastructure to develop next-generation models.
  • Continuously optimize inference performance, including batching, caching, and request routing strategies, to maximize compute efficiency under explosive customer growth.
  • Build tooling and observability to monitor system health, identify bottlenecks, and proactively resolve instability.
  • Introduce new techniques, architectures, and best practices to push the limits of scalability, efficiency, and reliability.
  • Own problems end-to-end, from design to deployment, with a strong bias toward quality, automation, and continuous improvement.
  • Balance rapid iteration on early-stage systems with long-term maintainability and architectural soundness.
  • Contribute to a culture of engineering excellence, mentorship, and team-first collaboration.

Skills and Qualifications

  • 7+ years of professional software engineering experience, with a focus on inference.
  • Deep understanding of machine learning systems at scale, including load balancing, request routing, or traffic management.
  • Experience with inference optimization, batching, and caching strategies.
  • Ability to design APIs and service interfaces for real-time and latency-sensitive use cases.
  • Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP).
  • Strong debugging, instrumentation, and observability skills across distributed systems.
  • Demonstrated ownership of complex technical problems and ability to learn and adapt quickly.
  • Proven track record of scaling systems through rapid growth and rebuilding or refactoring for new demands.
  • Experience building systems that degrade gracefully under load: backpressure, rate limiting, circuit breaking, bulkheading, and queuing.
  • Strong understanding of failure modes in distributed systems and mitigation techniques.
  • Proven experience owning high-availability services (e.g., SLOs, incident response, on-call), including capacity planning and load testing.
  • Proficiency in multiple programming languages (e.g., Rust, C++, Python).
  • Experience designing internal tools or platforms to support developer productivity and experimentation.
  • Strong product intuition and ability to collaborate closely with cross-functional teams, including research and design.