Data Engineer (DBT + Spark + Argo)
We are seeking a highly skilled Data Engineer to join a remote‑first, collaborative team driving the modernization of large‑scale data platforms in the healthcare sector.
In this role, you will work on transforming legacy SQL pipelines into modular, scalable, and testable DBT architectures, leveraging Spark for high‑performance processing and Argo for workflow orchestration.
You will implement modern lakehouse solutions, optimize storage and querying strategies, and enable real‑time analytics with ElasticSearch.
This position offers the chance to contribute to a cutting‑edge, cloud‑native data environment, working closely with cross‑functional teams to deliver reliable, impactful data solutions.
Accountabilities
- Translate legacy T‑SQL logic into modular, scalable DBT models powered by Spark SQL
- Build reusable, high‑performance data transformation pipelines
- Develop testing frameworks to ensure data accuracy and integrity within DBT workflows
- Design and orchestrate automated workflows using Argo Workflows and CI/CD pipelines with Argo CD
- Manage reference datasets and mock data (e.g., ICD‑10, CPT), maintaining version control and governance
- Implement efficient storage and query strategies using Apache Hudi, Parquet, and Iceberg
- Integrate ElasticSearch for analytics through APIs and pipelines supporting indexing and querying
- Collaborate with DevOps teams to optimize cloud storage, enforce security, and ensure compliance
- Participate in Agile squads, contributing to planning, estimation, and sprint reviews
Requirements
- Strong experience with DBT for data modeling, testing, and deployment
- Hands‑on proficiency in Spark SQL, including performance tuning
- Solid programming skills in Python for automation and data manipulation
- Familiarity with Jinja templating to build reusable DBT components
- Practical experience with data lake formats: Apache Hudi, Parquet, Iceberg
- Expertise in Argo Workflows and CI/CD integration with Argo CD
- Deep understanding of AWS S3 storage, performance tuning, and cost optimization
- Experience with ElasticSearch for indexing and querying structured/unstructured data
- Knowledge of healthcare data standards (e.g., ICD‑10, CPT)
- Ability to work cross‑functionally in Agile environments
- Nice to have : Experience with Docker, Kubernetes, cloud‑native data tools (AWS Glue, Databricks, EMR), CI/CD automation, data compliance standards (HIPAA, SOC2), or contributions to open‑source DBT/Spark projects
Benefits
- Contractor agreement with payment in USD
- 100% remote work within LATAM
- Observance of local public holidays
- Access to English classes and professional learning platforms
- Referral program and other growth opportunities
- Exposure to cutting‑edge data engineering projects in a cloud‑native environment
Colombia $12,000.00-$24,000.00
Bogota, D.C., Capital District, Colombia
#J-18808-Ljbffr