- Home
- Remote Jobs
- Python Insfrastructure Engineer - Model Evaluation
AL
Python Insfrastructure Engineer - Model Evaluation
Job summary
Seattle
Work model
Fully remote
Job description
About The Role
What if your Python expertise could directly shape the systems that power next-generation AI models? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation infrastructure that leading AI labs depend on to train and benchmark their models.
This is a high-impact, fully remote contract role working on real production systems --- not toy projects. You'll collaborate directly with data, research, and engineering teams at the frontier of AI development.
- Organization: Alignerr
- Type: Hourly Contract
- Location: Remote
- Commitment: 20--40 hours/week
What You'll Do
- Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
- Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
- Build and maintain evaluation harnesses that integrate with inference frameworks and benchmark AI model performance
- Improve reliability, performance, and safety across existing Python codebases
- Instrument systems with observability tooling --- metrics, logging, and monitoring to track system reliability and model performance
- Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
- Collaborate in synchronous design reviews to iterate on architecture and implementation decisions
Who You Are
- Native or fluent English speaker with strong written and verbal communication skills
- Full-stack developer with a solid systems programming background in Python
- 3--5+ years of professional experience writing production-grade Python
- Experienced building evaluation harnesses for ML models and integrating with inference frameworks
- Strong understanding of observability and metrics collection for monitoring system and model performance
- Able to commit 20--40 hours per week with reliability and focus
Nice to Have
- Prior experience with data annotation platforms, data quality systems, or evaluation pipelines
- Familiarity with AI/ML workflows, model training, or benchmarking infrastructure
- Experience with distributed systems or developer tooling at scale
- Background in MLOps, data engineering, or research engineering environments
Why Join Us
- Work on cutting-edge AI projects alongside leading research labs at the frontier of the field
- Fully remote and async-friendly --- work from wherever you do your best work
- Freelance autonomy with the substance of meaningful, high-impact engineering work
- Make a direct, tangible contribution to the systems that shape how AI models are built and evaluated
- Potential for ongoing work and contract extension as new projects launch