Already filled

Don't miss the next one. Get matching roles delivered to your inbox.

Site Reliability Engineer

Job summary

Orlando
Software Developer

Work model

Fully remote
Only US
1 month ago
Job description

Overview

A fast-growing healthcare technology organization is seeking a Site Reliability Engineer (SRE) to help scale and support a high-impact cloud platform focused on improving healthcare delivery nationwide. This role will play a critical part in strengthening platform reliability, operational efficiency, observability, and automation across production environments.

The ideal candidate is passionate about infrastructure stability, incident response, automation, and continuous improvement within modern cloud-native environments.

Key Responsibilities

  • Ensure the reliability, scalability, performance, and security of cloud-based infrastructure and applications
  • Monitor, troubleshoot, and resolve production platform and application issues across distributed systems
  • Lead incident response efforts, root cause analysis, and blameless post-mortems
  • Build and maintain operational runbooks and automated remediation workflows
  • Develop and enhance observability and telemetry solutions for proactive monitoring and alerting
  • Collaborate closely with engineering, DevOps, QA, security, and operations teams to improve platform health and deployment processes
  • Support infrastructure automation and configuration management initiatives
  • Contribute to infrastructure-as-code (IaC) practices and CI/CD operational improvements
  • Promote best practices around reliability engineering, incident management, and operational excellence
  • Participate in an on-call rotation supporting production systems, including occasional off-hours support for West Coast operations

Required Qualifications

  • 5+ years of experience in Site Reliability Engineering, DevOps, Cloud Infrastructure, or related disciplines
  • Strong experience troubleshooting and supporting production environments
  • Hands-on experience with observability and monitoring platforms such as Datadog, New Relic, or similar tools
  • Experience working within Azure-based cloud environments and modern containerized infrastructure
  • Knowledge of Docker, Kubernetes, and cloud-native application hosting environments
  • Experience with infrastructure-as-code tools such as Terraform, Terragrunt, or OpenTofu
  • Strong scripting and automation experience using PowerShell, Python, JavaScript, or similar languages
  • Experience with source control and CI/CD tooling (Git, Azure DevOps, etc.)
  • Understanding of cloud security principles, compliance frameworks, and operational best practices
  • Strong collaboration and communication skills within Agile engineering environments

Preferred Qualifications

  • Experience improving operational visibility through telemetry, dashboards, reports, and alerting systems
  • Experience evolving incident response processes and operational tooling
  • Passion for mentoring others and promoting operational excellence across teams
  • Strong problem-solving mindset with a focus on continuous improvement and automation

Additional Details

  • Opportunity to work on mission-driven technology with meaningful real-world impact
  • Collaborative engineering culture focused on innovation, reliability, and continuous learning
  • Flexible environment that supports work-life balance while maintaining operational excellence

If interested/qualified, please email [email protected]