Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Site Reliability Engineer.
Colombia Jobs Expertini

Urgent! Site Reliability Engineer Job Opening In Huila – Now Hiring datAvail

Site Reliability Engineer



Job description

You will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers.

You will partner with developer teams to embed resilience into feature delivery.

Responsibilities

  • Define and maintain SLIs/SLOs, monitor alignment and error budget usage
  • Lead incident response and postmortems, implement corrective measures
  • Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)

Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
- Lead technical discussions with customers to align on reliability, scalability, and performance requirements
- Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
- Implement and extend observability systems (metrics, tracing, log aggregation)
- Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
- Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
- Collaborate with dev teams to integrate resilience patterns (circuit breakers, bulkheading)
- Participate in architecture discussions around high availability, disaster recovery
- Mentor mid and junior SREs; conduct reliability design reviews

Must-have Qualifications

  • 5–8 years of experience in a reliability or operations role
  • Cloud-agnostic certification**: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
  • Cloud provider certification**: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional)
  • Solid coding skills (Python, Go, or equivalent)
  • Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
  • Comfortable with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
  • Experience working in distributed systems and production scale services

Nice-to-have Skills

  • Exposure to multi-cloud data replication or cross-cloud networks
  • Experience with chaos engineering or fault injection


Required Skill Profession

Other General



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Site Reliability Potential: Insight & Career Growth Guide