Site Reliability Engineer (Middle) ID38916 at AgileEngine

Job Overview

Company

AgileEngine

Location

Bucaramanga

Ready to Apply?

Take the Next Step in Your Career

Join AgileEngine and advance your career in Redes y sistemas

Apply for This Position

Click the button above to apply on our website

Job Description

Site Reliability Engineer (Middle) ID38916

AgileEngine is an Inc.

5000 company that creates award-winning software for Fortune 500 brands and startups across 17+ industries.

We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us Best Place to Work awards.

WHY JOIN US

If you're looking for a place to grow, make an impact, and work with people who care, we’d love to meet you!

WHAT YOU WILL DO

Shift: Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on-call
On-call shifts: every 6 weeks, for one week as primary responder and the next week as secondary
Manage alerts daily, check systems, and escalate issues as needed
Provide 24×7 on-call support for critical SaaS events
Be available in emergencies when team members are not available or need help
Document issues and remediation steps
Proactively create appropriate monitors in the EKS/K8S ecosystem
Deploy to EKS/K8s cluster using Terraform and Helm
Learn and maintain existing infrastructure running under Docker Swarm
Improve infrastructure health by implementing checks and scripts to correct known issues
Maintain and develop deployment code
Automate manual tasks
Implement/integrate new technologies in Cloud Infrastructure
Collaborate with Support, Customer Success, Migration, and Professional Services to provide high-level SaaS service
Apply a customer-focused approach when planning deployments/updates
Work with solutions teams to provide best-in-class service to customers
Perform RCA and take corrective actions to prevent recurrence
Create and assign alert-related actions after investigations
Handle environment-specific support requests
Identify automation opportunities to improve RCA

Must Haves

2+ years of professional experience
Experience working with Datadog
Hands-on experience as an AWS Cloud Engineer
Working knowledge of EKS/Terraform/Helm
Experience with Docker and Docker Swarm
Understanding of AWS IAM roles and policies
Experience logging and monitoring AWS resources with CloudWatch
Experience in a Linux environment
Proficient in Bash and/or Python scripting
Strong understanding of REST APIs
Experience with monitoring solutions such as Grafana and Prometheus
Excellent oral and written communication skills
Customer-facing communication skills to explain issues and RCAs
Experience in Product/Application Support for SaaS products
Understanding of APIs, Databases, Systems Architecture, and Design
Experience designing, implementing, and operating in a DevSecOps environment
Ability to work independently and in a team
Technical aptitude and willingness to learn new technologies
Upper-Intermediate English level

Nice to Have

Experience with GCP or Azure
Certifications: AWS Certified DevOps Engineer – Professional or AWS Certified Advanced Networking Specialty

Perks and Benefits

Professional growth: Mentorship, TechTalks, and personalized growth roadmaps
Competitive compensation: USD-based compensation with budgets for education, fitness, and team activities
Exciting projects: Modern solutions development for Fortune 500 enterprises and leading brands
Flextime: Flexible schedule with options to work from home or office

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

IT Services and IT Consulting

Referrals increase your chances of interviewing at AgileEngine.

Get notified about new Site Reliability Engineer jobs in Bucaramanga, Santander, Colombia.

#J-18808-Ljbffr

About AgileEngine

Quick Access Links

Job Details:
https://co.expertini.com/jobs/job/site-reliability-engineer-middle-id38916-bucaramanga-agileengine-355-3021195/

Company Jobs:
More AgileEngine Jobs

Location Jobs:
Jobs in Bucaramanga

Category Jobs:
Redes y sistemas Jobs

Don't Miss This Opportunity!

AgileEngine is actively hiring for this Site Reliability Engineer (Middle) ID38916 position

Apply Now

Site Reliability Engineer (Middle) ID38916

Job Overview

Ready to Apply?

Job Description

Site Reliability Engineer (Middle) ID38916

WHAT YOU WILL DO

Must Haves

Nice to Have

Perks and Benefits

Seniority level

Employment type

Job function

About AgileEngine

Similar Opportunities

Quick Access Links

Don't Miss This Opportunity!