QA Automation at TEKsystems c/o Allegis Group

LLM Evaluation Analyst

About the Role

MUST HAVE 3-4 YEARS OF PLAYWRIGHT EXPERIENCE WITH A GREAT UNDERSTANDING WITH LLMS

We are seeking 3 Evaluation Analysts to assess the performance of AI models tasked with implementing web features. Your work directly informs whether AI-generated code is correct, whether the instructions given to the models are clear, and whether the testing frameworks used to evaluate them are fair and reliable.

Core Responsibilities

You will analyze the quality of the entire evaluation pipeline-from the instructions given to the AI through to the final score-across the following categories:

Model Capability - Assess how well the AI performed each task by reviewing generated transcripts and results.
Bug Discovery Value - Identify patterns in AI code failures to understand why the model is making specific mistakes.
Score Health - Ensure tests are properly configured to apply correct scores during evaluation.
Task Specification Quality - Verify that prompts given to the AI are clear, correct, and technically precise (e.g., consistent variable names).
Test-by-Test Analysis - Evaluate the quality of automated tests using metrics like precision and recall to ensure they accurately measure AI performance.
Platform Issues - Report bugs or problems within the evaluation system itself, particularly with automated browser testing services.

Required Qualifications

Expertise in finding patterns and issues in generative AI/LLM outputs
Direct experience with labeling and scoring frameworks
Experience writing Playwright tests
Strong analytical and problem-solving skills
Ability to interpret model behavior and articulate failure modes clearly

Preferred Qualifications

Advanced analytical credentials - highly relevant for interpreting model behavior
Familiarity with web development concepts (HTML, CSS, JavaScript)
Experience with automated testing and grading systems
Background in quality assurance or evaluation methodology

Skills

Playwright

Job Details

Experience Level: Intermediate Level
Job Type & Location: Contract position. This is a fully remote position.
Pay Range: $35.00 - $50.00/hr.
Application Deadline: May 29, 2026.

Benefits

Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. If eligible, the benefits available for this temporary role may include:

Medical, dental & vision
Critical Illness, Accident, and Hospital
401(k) Retirement Plan
Life Insurance
Short and long-term disability
Health Spending Account (HSA)
Transportation benefits
Employee Assistance Program
Time Off/Leave (PTO, Vacation or Sick Leave)

About TEKsystems

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia.

Equal Opportunity Employer: The company will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

Additional Information

San Francisco Fair Chance Ordinance: We will consider for employment qualified applicants with arrest and conviction records.
Massachusetts Lie Detector: It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment.
Use of Artificial Intelligence (AI): We may use AI to support parts of our hiring process. By applying, you acknowledge and agree that your application may be reviewed using AI tools.

Already filled

QA Automation

Job summary

Work model