Job Overview
Category
computer-and-mathematical
Ready to Apply?
Take the Next Step in Your Career
Join AgileEngine and advance your career in computer-and-mathematical
Apply for This Position
Click the button above to apply on our website
Job Description
AgileEngine is an Inc.
5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries.
We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you! WHAT YOU WILL DO - Shift: Monday Thursday 8AM 7PM PST (11AM 10PM EST) with rotating on-call; - On call shifts: every 6 weeks, for one week as primary responder and next week as secondary; - Manage alerts daily, check systems, and escalate issues as needed; - Be part of a team that provides 247 on-call support for critical SaaS events; - Be available in case of emergencies when team members are not available or need help; - Document issues and remediation steps; - Proactively create appropriate monitors in the EKS/K8S ecosystem; - Deploy to EKS/K8s cluster using Terraform and Helm; - Learn and maintain existing infrastructure running under Docker Swarm; - Improve existing infrastructure health by implementing checks and scripts to correct known issues; - Maintain and develop deployment code; - Automate manual tasks; - Implement/integrate new technologies in our Cloud Infrastructure; - Collaborate with other teams and departments to provide the highest level of support and assistance; - Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes; - Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers; - Perform RCA and take necessary corrective actions to prevent the recurrence of issues; - Create and assign alert-related actions to the appropriate team after the investigation; - Handle support requests for environment-specific actions; - Identify and provide automation requirements to improve RCA.
MUST HAVES - 2+ years of professional experience; - Experience working with Datadog; - Hands-on experience as an AWS Cloud Engineer; - Working knowledge of EKS/Terraform/Helm; - Working Experience with Docker and Docker Swarm; - Good understanding of AWS IAM roles and policies; - Experience logging and monitoring AWS resources using CloudWatch logs; - Experience working in a Linux environment; - Proficient in Bash and/or Python scripting; - A strong understanding of web technologies such as REST APIs; - Working Experience with monitoring solutions, such as Grafana and Prometheus; - Excellent oral and written communication skills; - Customer-facing communication skills to effectively explain issues and RCAs to them; - Experience in Product/Application Support for SaaS-based products; - Understanding of APIs, Databases, Systems Architecture, and Design; - Designing, implementing, and operating in a DevSecOps; - Excellent communication skills, both written and verbal; - Ability to work independently as well as within a collaborative environment; - A technical aptitude with the desire to learn new and evolving technologies; - Upper-Intermediate English level.
NICE TO HAVES - Experience with GCP or Azure; - Certifications: AWS Certified DevOps Engineer Professional or AWS Certified Advanced Networking Specialty.
PERKS AND BENEFITS - Professional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.
- Competitive compensation: We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.
- A selection of exciting projects: Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.
- Flextime: Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office whatever makes you the happiest and most productive.
.
Middle
2+ years of professional experience; Experience working with Datadog; Hands-on experience as an AWS Cloud Engineer; Working knowledge of EKS/Terraform/Helm; Working Experience with Docker and Docker Swarm; Good understanding of AWS IAM roles and policies; Experience logging and monitoring AWS resources using CloudWatch logs; Experience working in a Linux environment; Proficient in Bash and/or Python scripting; A strong understanding of web technologies such as REST APIs; Working Experience with monitoring solutions, such as Grafana and Prometheus; Excellent oral and written communication skills; Customer-facing communication skills to effectively explain issues and RCAs to them; Experience in Product/Application Support for SaaS-based products; Understanding of APIs, Databases, Systems Architecture, and Design; Designing, implementing, and operating in a DevSecOps; Excellent communication skills, both written and verbal; Ability to work independently as well as within a collaborative environment; A technical aptitude with the desire to learn new and evolving technologies; Upper-Intermediate English level.
Don't Miss This Opportunity!
AgileEngine is actively hiring for this Site Reliability Engineer (Middle) ID38916 position
Apply Now