- Home
- Remote Jobs
- [Remote] Staff Backend Software Engineer: Inference
[Remote] Staff Backend Software Engineer: Inference
Job summary
Work model
About Archetype AI
Archetype AI develops Physical AI agents that harness real-world sensor data to enhance decision-making and automate processes. Founded in 2023 and headquartered in Palo Alto, California, USA, Archetype AI has a workforce of 11-50 employees. Its website is https://www.archetypeai.io.
Archetype AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role.
Job Overview
Archetype AI is seeking a highly motivated backend engineer to design and develop performant, scalable, and resilient inference services. This is a remote role open to candidates in the USA. You will work closely with researchers and product teams to bring AI capabilities into production.
Responsibilities
- Architect, implement, and maintain distributed inference serving systems that support high-throughput, low-latency model serving across multiple AI accelerator families and cloud platforms.
- Enable breakthrough research by providing scientists with high-performance inference infrastructure to develop next-generation models.
- Continuously optimize inference performance, including batching, caching, and request routing strategies, to maximize compute efficiency under explosive customer growth.
- Build tooling and observability to monitor system health, identify bottlenecks, and proactively resolve instability.
- Introduce new techniques, architectures, and best practices to push the limits of scalability, efficiency, and reliability.
- Own problems end-to-end, from design to deployment, with a strong bias toward quality, automation, and continuous improvement.
- Balance rapid iteration on early-stage systems with long-term maintainability and architectural soundness.
- Contribute to a culture of engineering excellence, mentorship, and team-first collaboration.
Skills and Qualifications
- 7+ years of professional software engineering experience, with a focus on inference.
- Deep understanding of machine learning systems at scale, including load balancing, request routing, or traffic management.
- Experience with inference optimization, batching, and caching strategies.
- Ability to design APIs and service interfaces for real-time and latency-sensitive use cases.
- Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP).
- Strong debugging, instrumentation, and observability skills across distributed systems.
- Demonstrated ownership of complex technical problems and ability to learn and adapt quickly.
- Proven track record of scaling systems through rapid growth and rebuilding or refactoring for new demands.
- Experience building systems that degrade gracefully under load: backpressure, rate limiting, circuit breaking, bulkheading, and queuing.
- Strong understanding of failure modes in distributed systems and mitigation techniques.
- Proven experience owning high-availability services (e.g., SLOs, incident response, on-call), including capacity planning and load testing.
- Proficiency in multiple programming languages (e.g., Rust, C++, Python).
- Experience designing internal tools or platforms to support developer productivity and experimentation.
- Strong product intuition and ability to collaborate closely with cross-functional teams, including research and design.