Senior Kubernetes Developer - OPS00016 at Dev.Pro

Job Overview

Company

Dev.Pro

Location

WorkFromHome

Ready to Apply?

Take the Next Step in Your Career

Join Dev.Pro and advance your career in Desarrollo de software

Apply for This Position

Click the button above to apply on our website

Job Description

Overview

Dev.Pro Bogota, D.C., Capital District, Colombia — We invite a skilled Kubernetes Developer to join our fully remote, international team.

In this role, you'll build and optimize the Kubernetes orchestration platform and develop custom operators to run HPC/AI workloads efficiently on GPU clusters.

You'll enhance infrastructure performance and reliability, create internal tools to improve the developer experience, and ensure multi-tenant HPC workloads remain secure and compliant.

What’s in it for you

Work on cutting-edge GPU infrastructure and next-gen HPC/AI workloads
Build a Slurm-on-Kubernetes product from scratch and shape its architecture
Collaborate with a top-tier international team and grow through continuous learning and conference participation

Key Responsibilities

Design, develop, and manage Kubernetes platforms for GPU-intensive AI/HPC workloads
Design and build a Slurm-like orchestration layer on Kubernetes for HPC/AI workloads
Develop custom operators and controllers for GPU job scheduling and execution
Integrate batch schedulers with Kubernetes to provide a hybrid HPC/Cloud product
Implement advanced GPU resource management and multi-tenant isolation policies
Build internal tools and a self-service platform to simplify AI/HPC job deployment and management
Monitor GPU clusters, troubleshoot production issues, and ensure high availability, fault tolerance, and disaster recovery
Develop CI/CD pipelines for GPU-intensive workloads
Ensure compliance with data sovereignty and international regulations

Qualifications

3+ years of hands-on Kubernetes experience in production
Experience with HPC schedulers (Slurm, PBS, LSF, Volcano)
Strong background in GPU resource management and distributed systems
Experience with cloud/hybrid cloud architectures (AWS, GCP, Azure, on-prem GPU clusters)
Knowledge of Kubernetes operators, CRDs, scheduling, networking, and storage
Deep knowledge of HPC job scheduling and workload orchestration
Expertise in IaC (Terraform, Helm, or GitOps: ArgoCD/Flux) and monitoring & observability (Prometheus, Grafana, Jaeger, ELK)
Programming skills in Go, Python, Bash/Shell
Familiarity with PyTorch, TensorFlow, distributed training, and model serving
Skills in Linux administration, performance tuning, and advanced networking (RDMA, InfiniBand, TCP/IP, DNS, load balancing)
Experience in storage management and optimization for large datasets

Note: This role is fully remote and international, with a focus on collaboration across time zones.

#J-18808-Ljbffr

About Dev.Pro

Quick Access Links

Job Details:
https://co.expertini.com/jobs/job/senior-kubernetes-developer-ops00016-workfromhome-devpro-355-3063504/

Company Jobs:
More Dev.Pro Jobs

Location Jobs:
Jobs in WorkFromHome

Category Jobs:
Desarrollo de software Jobs

Don't Miss This Opportunity!

Dev.Pro is actively hiring for this Senior Kubernetes Developer - OPS00016 position

Apply Now

Senior Kubernetes Developer - OPS00016