Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Senior Kubernetes Developer OPS00016.
Colombia Jobs Expertini

Urgent! Senior Kubernetes Developer - OPS00016 Job Opening In WorkFromHome – Now Hiring Dev.Pro

Senior Kubernetes Developer OPS00016



Job description

Overview

Dev.Pro Bogota, D.C., Capital District, Colombia — We invite a skilled Kubernetes Developer to join our fully remote, international team.

In this role, you'll build and optimize the Kubernetes orchestration platform and develop custom operators to run HPC/AI workloads efficiently on GPU clusters.

You'll enhance infrastructure performance and reliability, create internal tools to improve the developer experience, and ensure multi-tenant HPC workloads remain secure and compliant.

What’s in it for you

  • Work on cutting-edge GPU infrastructure and next-gen HPC/AI workloads
  • Build a Slurm-on-Kubernetes product from scratch and shape its architecture
  • Collaborate with a top-tier international team and grow through continuous learning and conference participation

Key Responsibilities

  • Design, develop, and manage Kubernetes platforms for GPU-intensive AI/HPC workloads
  • Design and build a Slurm-like orchestration layer on Kubernetes for HPC/AI workloads
  • Develop custom operators and controllers for GPU job scheduling and execution
  • Integrate batch schedulers with Kubernetes to provide a hybrid HPC/Cloud product
  • Implement advanced GPU resource management and multi-tenant isolation policies
  • Build internal tools and a self-service platform to simplify AI/HPC job deployment and management
  • Monitor GPU clusters, troubleshoot production issues, and ensure high availability, fault tolerance, and disaster recovery
  • Develop CI/CD pipelines for GPU-intensive workloads
  • Ensure compliance with data sovereignty and international regulations

Qualifications

  • 3+ years of hands-on Kubernetes experience in production
  • Experience with HPC schedulers (Slurm, PBS, LSF, Volcano)
  • Strong background in GPU resource management and distributed systems
  • Experience with cloud/hybrid cloud architectures (AWS, GCP, Azure, on-prem GPU clusters)
  • Knowledge of Kubernetes operators, CRDs, scheduling, networking, and storage
  • Deep knowledge of HPC job scheduling and workload orchestration
  • Expertise in IaC (Terraform, Helm, or GitOps: ArgoCD/Flux) and monitoring & observability (Prometheus, Grafana, Jaeger, ELK)
  • Programming skills in Go, Python, Bash/Shell
  • Familiarity with PyTorch, TensorFlow, distributed training, and model serving
  • Skills in Linux administration, performance tuning, and advanced networking (RDMA, InfiniBand, TCP/IP, DNS, load balancing)
  • Experience in storage management and optimization for large datasets

Note: This role is fully remote and international, with a focus on collaboration across time zones.

#J-18808-Ljbffr


Required Skill Profession

Desarrollo De Software



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Senior Kubernetes Potential: Insight & Career Growth Guide