We are seeking a highly skilled Senior Data Platform Operations Engineer to ensure the stability, security, performance, and cost efficiency of our global enterprise data platform.
This role is pivotal in providing 8/5 operational coverage within a follow-the-sun 24x5 support model, ensuring the platform consistently supports business activities worldwide.
The ideal candidate will demonstrate expertise in cloud-based data platforms, a strong operational mindset, and a proactive approach to optimizing performance, enhancing observability, and managing costs.
We accept CVs in English only.
Responsibilities
- Maintain a stable, secure, and performant enterprise data platform (Snowflake, AWS data stack, dbt, orchestration tools, BI/analytics, etc.)
- Provide operational coverage within an 8/5 support model and participate in a 24/7 on-call rotation for critical incidents
- Implement robust monitoring, alerting, and observability solutions to facilitate proactive incident detection and resolution
- Perform platform upgrades, patching, and configuration management in alignment with security and compliance requirements
- Continuously tune system performance to meet evolving business needs
- Use holistic observability frameworks covering infrastructure, data pipelines, and platform services to execute monitoring activities
- Deliver actionable operational insights through monitoring dashboards and reporting
- Identify and execute process automation to improve efficiency and reduce manual interventions
- Propose and implement continuous improvements to advance platform resilience, scalability, and cost-effectiveness
- Contribute to infrastructure-as-code and configuration-as-code practices for consistent, repeatable operations
Requirements
- Background in managing cloud-native data platforms for over 3 years (e.g., Snowflake, Databricks, BigQuery, or similar)
- Expertise in cloud infrastructure (AWS) with emphasis on operations, automation, and cost governance
- Skills in monitoring and observability tools (Datadog, Prometheus, Grafana, ELK, CloudWatch, etc.)
- Knowledge of Infrastructure as Code (Terraform, Pulumi, Ansible) and configuration management practices
- Understanding of networking, security, and compliance in cloud environments
- Competency in problem-solving with a proactive, service-oriented mindset
- Flexibility to work in a global operations environment with on-call responsibilities
- Qualifications in clear communication and collaboration with engineering, data, and business stakeholders
- Commitment to continuous improvement and operational excellence
- Proficiency in English language at an Upper-Intermediate level (B2) or higher
Nice to have
- Showcase of implementing FinOps frameworks and cost optimization practices
- Background in working within regulated industries (pharma, healthcare, finance) in compliance-driven environments
- Familiarity with modern data stack tools (dbt, Dagster/Airflow, ThoughtSpot, Tableau, Power BI)
- Understanding of SRE (Site Reliability Engineering) principles and practices
We offer
- Learning Culture - We want you to be the best version of yourself, that is why we offer unlimited access to learning platforms, a wide range of internal courses, and all the knowledge you need to grow professionally
- Health Coverage - Health and wellness are important, that is why we have you and up to four family members in a premiere health plan.
We have a couple of options, so you can choose what is best for you and your family - Visual Benefit - Seeing your work for us would be a sight for sore eyes.
We want your vision to always be at 100% which is why we offer up to $ COP for any visual health expenses - Life Insurance Plan - We have partnered with MetLife to offer a full-coverage Ife insurance plan.
So, your family is covered, even if you are gone.
- Medical Leave Coverage - We are one of the few companies that cover 100% of your medical leave, for up to 90 days.
Your health is the most important thing to us - Professional Growth Opportunities - We have designed a highly competitive and complete development process, where you will have all the tools to get where you have always wanted to be, personally and professionally
- Stock Option Purchase Plan - As an EPAMer you can be more than just an employee, you will also have the opportunity to purchase stock at a reduced price and become a part owner of our organization
- Additional Income - Besides your regular salary, you will also have the chance to earn extra income by referring talent, being a technical interviewer, and many more ways
- Community Benefit - You will be part of a worldwide community of over 50,000 employees, where you can learn, challenge yourself, stand out, and share your knowledge and experience with multicultural teams
Please note that even though you are applying for this position, you may be offered other projects to join within EPAM.
EPAM is a leading global provider of digital platform engineering and development services.
We are committed to having a positive impact on our customers, our employees, and our communities.
We embrace a dynamic and inclusive culture.
Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow.
No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.