Already filled

Don't miss the next one. Get matching roles delivered to your inbox.

MR

MRP-Global

Senior Machine Learning Scientist – Bioinformatics, Python, ML – USA, Remote

Job summary

North Carolina
Engineering

Work model

Fully remote
Only United States
1 month ago
Job description

Key Skills/Responsibilities:

The primary priority is for the candidate to have publication‑backed experience, specifically in:

  • Predictive ML on biological data (classification modeling)
  • Genotype/feature → phenotype modeling
  • Method development or algorithmic contributions
  • Model interpretability and generation of biological insight

Technical Expertise

  • Strong background in machine learning or statistical learning with substantial hands‑on experience developing classification models
  • Experience working with high‑dimensional, sparse biological or omics datasets
  • Strong proficiency in Python for end‑to‑end machine‑learning workflows
  • Demonstrated experience designing validation strategies and assessing performance under significant class imbalance and limited sample sizes

Scientific Rigor

  • Clear understanding of model limitations, uncertainty, and overfitting risks in real‑world biological datasets
  • Experience delivering machine‑learning analyses intended to inform research and internal decision‑making
  • Experience making principled methodological recommendations in the face of incomplete or noisy data

Preferred Qualifications

  • Experience working with biological sequence data or genotype--phenotype analyses
  • Experience with interpretability or explainability approaches applied to biological machine‑learning models
  • Background in pharmaceutical, biotech, or regulated research environments

Machine Learning Model Development

  • Develop classification models to analyze curated genotype--phenotype datasets
  • Apply appropriate modeling strategies to predict viral sensitivity or resistance based on sequence‑derived features
  • Implement training, validation, and hyperparameter‑tuning workflows using predefined datasets
  • Evaluate alternative feature representations provided by the bioinformatics team and assess their suitability

Model Evaluation and Robustness

  • Assess model performance using metrics appropriate for imbalanced biological datasets
  • Evaluate robustness across data splits, phenotype definitions, and successive data releases
  • Identify failure modes, instability, and limitations, and document their implications
  • Document modeling assumptions, trade‑offs, uncertainty, and limitations in a reproducible and transparent manner

Interpretability and Insight Generation

  • Provide interpretable summaries of model behavior, including feature importance and consistency of signals
  • Identify amino‑acid positions or features that recur across models or resampling strategies, while highlighting where signals are not reproducible
  • Clearly document and communicate findings, assumptions, and caveats within the bioinformatics team