- Home
- Remote Jobs
- Senior Machine Learning Scientist – Bioinformatics, Python, ML – USA, Remote
Already filled
Don't miss the next one. Get matching roles delivered to your inbox.
MR
MRP-Global
Senior Machine Learning Scientist – Bioinformatics, Python, ML – USA, Remote
Job summary
North Carolina
Engineering
Work model
Fully remote
Only United States
1 month ago
Job description
Key Skills/Responsibilities:
The primary priority is for the candidate to have publication‑backed experience, specifically in:
- Predictive ML on biological data (classification modeling)
- Genotype/feature → phenotype modeling
- Method development or algorithmic contributions
- Model interpretability and generation of biological insight
Technical Expertise
- Strong background in machine learning or statistical learning with substantial hands‑on experience developing classification models
- Experience working with high‑dimensional, sparse biological or omics datasets
- Strong proficiency in Python for end‑to‑end machine‑learning workflows
- Demonstrated experience designing validation strategies and assessing performance under significant class imbalance and limited sample sizes
Scientific Rigor
- Clear understanding of model limitations, uncertainty, and overfitting risks in real‑world biological datasets
- Experience delivering machine‑learning analyses intended to inform research and internal decision‑making
- Experience making principled methodological recommendations in the face of incomplete or noisy data
Preferred Qualifications
- Experience working with biological sequence data or genotype--phenotype analyses
- Experience with interpretability or explainability approaches applied to biological machine‑learning models
- Background in pharmaceutical, biotech, or regulated research environments
Machine Learning Model Development
- Develop classification models to analyze curated genotype--phenotype datasets
- Apply appropriate modeling strategies to predict viral sensitivity or resistance based on sequence‑derived features
- Implement training, validation, and hyperparameter‑tuning workflows using predefined datasets
- Evaluate alternative feature representations provided by the bioinformatics team and assess their suitability
Model Evaluation and Robustness
- Assess model performance using metrics appropriate for imbalanced biological datasets
- Evaluate robustness across data splits, phenotype definitions, and successive data releases
- Identify failure modes, instability, and limitations, and document their implications
- Document modeling assumptions, trade‑offs, uncertainty, and limitations in a reproducible and transparent manner
Interpretability and Insight Generation
- Provide interpretable summaries of model behavior, including feature importance and consistency of signals
- Identify amino‑acid positions or features that recur across models or resampling strategies, while highlighting where signals are not reproducible
- Clearly document and communicate findings, assumptions, and caveats within the bioinformatics team