Platform engineering
Kubernetes, kubeadm, Helm, Argo CD, Kyverno, OpenTofu/Terraform, Docker Buildx, local registries, Linux, container runtimes, storage, and worker placement.
Senior DevOps / MLOps / Platform Engineer
Contacts: +52 449 217 6833, juvenaldiaz522@gmail.com
Senior infrastructure and reliability engineer with 12+ years of experience across Linux, Kubernetes-based platforms, cloud operations, incident response, automation, and production support. Currently operating a Kubernetes/Terraform PaaS used by 20,000+ internal developers, with hands-on work in maintenance, emergency changes, tooling, documentation, GitOps-style delivery, and continuous improvement. Targeting senior remote DevOps, SRE, platform, and MLOps roles where reliability, automation, and clear operations matter.
20,000+ internal users supported through a Kubernetes/Terraform platform as a service.
10,000+ external Oracle Analytics customers supported through incident response, Linux troubleshooting, SQL tuning, automation, and runbook work.
4M+ requests per minute and 30+ microservices supported on a PCI-compliant platform using containers, orchestration, alerting, and DevOps practices.
Top performer history across Oracle and Rackspace teams, with onboarding, automation epics, and on-call process improvement work.
Kubernetes, kubeadm, Helm, Argo CD, Kyverno, OpenTofu/Terraform, Docker Buildx, local registries, Linux, container runtimes, storage, and worker placement.
Incident response, emergency changes, planned maintenance, monitoring, alert tuning, Prometheus, Grafana, Loki, node-exporter, runbooks, RCA, and ITIL process experience.
Bash, Python, Ansible, REST APIs, GitOps delivery loops, Gitea Actions, Bitbucket workflows, secret scanning, image scanning, and repeatable infrastructure scripts.
FastAPI inference service patterns, model metrics, drift detection, canary/rollback workflows, model-serving platform design, MLflow/KServe/Ray learning path, and Kubernetes-based ML operations.
Aug 2024 → Current
Site Reliability Developer – Oracle | Spectra
Operate a Kubernetes/Terraform platform as a service that lets 20,000+ internal developers build, run, and operate cloud applications. Daily work includes planned maintenance, emergency changes, tooling improvement, documentation, operational guardrails, and reliability-focused platform support.
June 2022 → July 2024
Site Reliability Developer – Oracle | Analytics
Resolved Oracle Analytics Cloud incidents for 10,000+ external customers, including Linux troubleshooting, SQL query tuning, service/job configuration, and usage issues. Built internal automation with Bash, Python, Ansible, and REST APIs, maintained SOPs, led continuous improvement and automation epics, supported new-hire onboarding, and proposed on-call rotation improvements.
July 2021 → June 2022
Linux Support Engineer - Rackspace
Handled multi-client Linux incidents through phone and ticketing channels across MySQL, Apache, NGINX, Varnish, PHP, VMware, DoS events, storage, backups, and firewalls. Ranked as a top performer by case volume across MX and US teams and helped onboard new hires.
March 2020 → July 2021
Linux Support Engineer - Softtek | Electronic Arts
Provided infrastructure support for a PCI-compliant platform handling 4M+ requests per minute across 30+ microservices. Supported container and orchestration technologies, DevOps operating practices, alert creation, and alert tuning.
August 2017 → March 2020
Cross Functional Manager - Softtek | Electronic Arts
Incident, Problem, Asset Management, and Automation (ITIL-based) process implementation, Continuous Improvement Assessments.
September 2015 → August 2017
Linux Support Engineer / Tech Lead - Softtek | General Electric
Incident, Change management, and monitoring for internal applications. Promoted to tech lead after one year in support position.
February 2013 → August 2015
Customer Support Agent – Teleperformance | Comcast
Provided customer support services taking calls from the US Southwest area to troubleshoot cable, phone, and internet services.