AV

Avalon Information Technologies L.L.C

Data Architect - Health Care/Life Sciences| GCP · Matillion · DBT - Remote Contract Position

Job summary

United States
Engineering

Work model

Fully remote
Only United States
2 days ago
Job description

Role Overview

We are seeking an experienced Data Architect with deep, hands-on expertise in GCP BigQuery, Matillion ETL, and dbt (Core/Cloud). You will lead the design and delivery of cloud-native, scalable data platforms for clinical and life-sciences clients, establishing architecture patterns that span ingestion, governance, transformation, and analytics.

This is a senior individual-contributor and technical-lead role with high visibility into client delivery and pre-sales solutioning.

Core Technology Stack

Cloud Platform

  • GCP
  • BigQuery
  • Cloud Storage
  • Cloud Composer (Airflow)

Cloud Build ETL / Orchestration

  • Matillion ETL
  • Cloud Composer
  • GitLab CI / GitHub Actions

Transformation & Modeling

  • dbt Core / Cloud
  • Staging
  • Intermediate
  • Marts
  • Macros
  • Tests

Languages

  • SQL (Advanced)
  • Python
  • Jinja2

Clinical Standards

  • CDISC SDTM/ADaM
  • OMOP
  • HL7/FHIR
  • EHR/EMR
  • EDC/CTMS

DevOps & Governance

  • Git
  • Code Review
  • CI/CD
  • Data Quality
  • Audit Logging

Key Responsibilities

GCP BigQuery Architecture

  • Design multi-project BigQuery environments aligned to medallion architecture principles
  • Optimize query performance using partitioning, clustering, and materialization strategies
  • Enforce dataset-level IAM, column-level security, and VPC Service Controls for regulated data
  • Integrate BigQuery with Cloud Storage, Pub/Sub, and Vertex AI for end-to-end analytics pipelines

Matillion ETL -- Orchestration & Ingestion

  • Architect Matillion job hierarchies including orchestration jobs, transformation jobs, and reusable profiles
  • Implement parameterization patterns for scalable multi-source loading
  • Design error-handling frameworks with audit logging, retry logic, and alerting via Cloud Monitoring
  • Migrate and refactor legacy ETL pipelines (on-prem or Informatica) into Matillion on GCP

DBT Transformation Layer

  • Build and govern dbt project structures: staging → intermediate → mart model layers
  • Author reusable macros (Jinja2), generic and singular tests, and schema.yml documentation blocks
  • Enforce CI/CD pipelines for dbt via GitLab CI / GitHub Actions
  • Connect dbt docs to DataHub, Collibra, or Alation for enterprise data governance

Cloud Composer & Workflow Orchestration

  • Design and maintain Cloud Composer (Airflow) DAGs to coordinate Matillion jobs, dbt runs, and downstream consumers
  • Implement SLA monitoring, alerting, and dependency management across cross-domain pipelines

DevOps & CI/CD

  • Manage Git branching strategies across data engineering repositories
  • Configure Cloud Build / GitHub Actions / GitLab CI pipelines for automated testing, linting, and deployment
  • Apply infrastructure-as-code (Terraform) for BigQuery datasets, Matillion instances, and IAM resources

Clinical Data & Compliance

  • Map source EHR/EMR, EDC, CTMS, and claims data to CDISC SDTM/ADaM and OMOP CDM standards
  • Design HL7/FHIR-compatible ingestion pipelines for real-world evidence and interoperability use cases
  • Ensure platform compliance with GxP, HIPAA, 21 CFR Part 11, and SOC 2 requirements

Required Qualifications

  • 8--14 years in data architecture, data engineering, or analytics engineering roles
  • 4 years hands-on with GCP BigQuery: DDL/DML, query optimization, IAM, Data Transfer Service, BigQuery ML
  • 3 years with Matillion ETL: job design, orchestration patterns, GCP connectors, Python components
  • 3 years with dbt Core and/or Cloud: project structure, macros, testing, documentation, CI/CD integration
  • Advanced SQL proficiency: window functions, CTEs, performance tuning, and analytical patterns
  • Proficient in Python for data validation, automation, and custom Matillion/Airflow components
  • Strong Git-based workflow discipline: branching, code review, pull requests, semantic versioning

Preferred Qualifications

  • Google Professional Data Engineer or Cloud Architect certification
  • Experience in clinical / pharmaceutical / life-sciences data environments (CDISC, OMOP, HL7/FHIR)
  • Familiarity with Terraform for GCP infrastructure provisioning
  • Exposure to dbt Semantic Layer, MetricFlow, or dbt Cloud Enterprise features
  • Experience with DataHub, Collibra, or Alation for metadata management and data cataloging

Background in MDM solutions (Reltio, Informatica MDM) is a plus.