- Home
- Remote Jobs
- Data Architect - Health Care/Life Sciences| GCP · Matillion · DBT - Remote Contract Position
Role Overview
We are seeking an experienced Data Architect with deep, hands-on expertise in GCP BigQuery, Matillion ETL, and dbt (Core/Cloud). You will lead the design and delivery of cloud-native, scalable data platforms for clinical and life-sciences clients, establishing architecture patterns that span ingestion, governance, transformation, and analytics.
This is a senior individual-contributor and technical-lead role with high visibility into client delivery and pre-sales solutioning.
Core Technology Stack
Cloud Platform
- GCP
- BigQuery
- Cloud Storage
- Cloud Composer (Airflow)
Cloud Build ETL / Orchestration
- Matillion ETL
- Cloud Composer
- GitLab CI / GitHub Actions
Transformation & Modeling
- dbt Core / Cloud
- Staging
- Intermediate
- Marts
- Macros
- Tests
Languages
- SQL (Advanced)
- Python
- Jinja2
Clinical Standards
- CDISC SDTM/ADaM
- OMOP
- HL7/FHIR
- EHR/EMR
- EDC/CTMS
DevOps & Governance
- Git
- Code Review
- CI/CD
- Data Quality
- Audit Logging
Key Responsibilities
GCP BigQuery Architecture
- Design multi-project BigQuery environments aligned to medallion architecture principles
- Optimize query performance using partitioning, clustering, and materialization strategies
- Enforce dataset-level IAM, column-level security, and VPC Service Controls for regulated data
- Integrate BigQuery with Cloud Storage, Pub/Sub, and Vertex AI for end-to-end analytics pipelines
Matillion ETL -- Orchestration & Ingestion
- Architect Matillion job hierarchies including orchestration jobs, transformation jobs, and reusable profiles
- Implement parameterization patterns for scalable multi-source loading
- Design error-handling frameworks with audit logging, retry logic, and alerting via Cloud Monitoring
- Migrate and refactor legacy ETL pipelines (on-prem or Informatica) into Matillion on GCP
DBT Transformation Layer
- Build and govern dbt project structures: staging → intermediate → mart model layers
- Author reusable macros (Jinja2), generic and singular tests, and schema.yml documentation blocks
- Enforce CI/CD pipelines for dbt via GitLab CI / GitHub Actions
- Connect dbt docs to DataHub, Collibra, or Alation for enterprise data governance
Cloud Composer & Workflow Orchestration
- Design and maintain Cloud Composer (Airflow) DAGs to coordinate Matillion jobs, dbt runs, and downstream consumers
- Implement SLA monitoring, alerting, and dependency management across cross-domain pipelines
DevOps & CI/CD
- Manage Git branching strategies across data engineering repositories
- Configure Cloud Build / GitHub Actions / GitLab CI pipelines for automated testing, linting, and deployment
- Apply infrastructure-as-code (Terraform) for BigQuery datasets, Matillion instances, and IAM resources
Clinical Data & Compliance
- Map source EHR/EMR, EDC, CTMS, and claims data to CDISC SDTM/ADaM and OMOP CDM standards
- Design HL7/FHIR-compatible ingestion pipelines for real-world evidence and interoperability use cases
- Ensure platform compliance with GxP, HIPAA, 21 CFR Part 11, and SOC 2 requirements
Required Qualifications
- 8--14 years in data architecture, data engineering, or analytics engineering roles
- 4 years hands-on with GCP BigQuery: DDL/DML, query optimization, IAM, Data Transfer Service, BigQuery ML
- 3 years with Matillion ETL: job design, orchestration patterns, GCP connectors, Python components
- 3 years with dbt Core and/or Cloud: project structure, macros, testing, documentation, CI/CD integration
- Advanced SQL proficiency: window functions, CTEs, performance tuning, and analytical patterns
- Proficient in Python for data validation, automation, and custom Matillion/Airflow components
- Strong Git-based workflow discipline: branching, code review, pull requests, semantic versioning
Preferred Qualifications
- Google Professional Data Engineer or Cloud Architect certification
- Experience in clinical / pharmaceutical / life-sciences data environments (CDISC, OMOP, HL7/FHIR)
- Familiarity with Terraform for GCP infrastructure provisioning
- Exposure to dbt Semantic Layer, MetricFlow, or dbt Cloud Enterprise features
- Experience with DataHub, Collibra, or Alation for metadata management and data cataloging
Background in MDM solutions (Reltio, Informatica MDM) is a plus.