About The Role

What if your Python expertise could directly shape the systems that power next-generation AI models? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation infrastructure that leading AI labs depend on to train and benchmark their models.

This is a high-impact, fully remote contract role working on real production systems --- not toy projects. You'll collaborate directly with data, research, and engineering teams at the frontier of AI development.

Organization: Alignerr
Type: Hourly Contract
Location: Remote
Commitment: 20--40 hours/week

What You'll Do

Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
Build and maintain evaluation harnesses that integrate with inference frameworks and benchmark AI model performance
Improve reliability, performance, and safety across existing Python codebases
Instrument systems with observability tooling --- metrics, logging, and monitoring to track system reliability and model performance
Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
Collaborate in synchronous design reviews to iterate on architecture and implementation decisions

Who You Are

Native or fluent English speaker with strong written and verbal communication skills
Full-stack developer with a solid systems programming background in Python
3--5+ years of professional experience writing production-grade Python
Experienced building evaluation harnesses for ML models and integrating with inference frameworks
Strong understanding of observability and metrics collection for monitoring system and model performance
Able to commit 20--40 hours per week with reliability and focus

Nice to Have

Prior experience with data annotation platforms, data quality systems, or evaluation pipelines
Familiarity with AI/ML workflows, model training, or benchmarking infrastructure
Experience with distributed systems or developer tooling at scale
Background in MLOps, data engineering, or research engineering environments

Why Join Us

Work on cutting-edge AI projects alongside leading research labs at the frontier of the field
Fully remote and async-friendly --- work from wherever you do your best work
Freelance autonomy with the substance of meaningful, high-impact engineering work
Make a direct, tangible contribution to the systems that shape how AI models are built and evaluated
Potential for ongoing work and contract extension as new projects launch

Python Insfrastructure Engineer - Model Evaluation

Job summary

Work model

About The Role

What You'll Do

Who You Are

Nice to Have

Why Join Us