Manoj Sainadh Devaki

AWS Certified Cloud Architect | Kubernetes Expert | DevOps Professional | Machine Learning Engineer

GPU Infrastructure Kubernetes at Scale ML Platform Engineering Reliability & Observability
Profile Picture

About

AWS Certified Cloud Architect and Site Reliability Engineer with 10 years of expertise in designing and scaling highly available, secure, and cost-efficient cloud infrastructure. Proficient in managing Kubernetes clusters, optimizing databases, and building automation pipelines using Terraform and Python. Experienced in optimizing and Machine Learing workloads operating on Kubernetes and Sagemaker environments, including regulated environments aligned with SOC 2, PCI-DSS, HIPAA, HITRUST, and FedRAMP.

10+ Years Cloud & platform engineering
Multi-Region AWS & Kubernetes operations
Compliance SOC 2, PCI-DSS, HIPAA, HITRUST
Automation Terraform + CI/CD pipelines

Highlights

A few focus areas that summarize the impact and scope of recent work.

Scalable cloud foundations

Designed AWS infrastructure with VPCs, Transit Gateways, and Kubernetes for global workloads.

ML platform delivery

Built NLP and GPU workloads on EKS with observability and reliability baked in.

Automation at scale

Standardized CI/CD and Terraform-driven provisioning for faster, safer releases.

Operational excellence

Implemented PCI-compliant zones, monitoring dashboards, and cost-optimization initiatives.

Work Experience

Qualtrics Inc.

Nov 2019 - Present

Senior Machine Learning Engineer

Aug 2025 - Present

Senior Systems Engineer – ML Infrastructure

Nov 2019 - Aug 2025

  • Led ML platform engineering for NLP and GPU workloads on AWS and Kubernetes (EKS), enabling reliable, cost-efficient scaling.
  • Built multi-region clusters with integrated security, observability (Prometheus/Grafana/Splunk), and SLO-driven reliability.
  • Implemented IaC and CI/CD (Terraform, CDK, CodePipeline) across multi-account environments with strong governance.
  • Designed scalable AWS infrastructures using VPCs, Transit Gateways, and Kubernetes, ensuring high availability for global workloads.
  • Developed and deployed an Automated Speech Recognition service using OpenAI Whisper on Kubernetes, achieving cost reductions for transcription tasks.
  • Implemented PCI-compliant infrastructure, integrating AWS WAF, Palo Alto firewalls, and SIEM tools for security compliance.
  • Developed CI/CD pipelines using AWS CodePipeline and Terraform to automate infrastructure provisioning and application deployment.
  • Architected and deployed site-to-site VPN tunnels, SFTP services, and Active Directory integration for secure connectivity and authentication.
  • Led the implementation of cost-optimization measures, including EBS volume migration and S3 data lifecycle policies, resulting in significant savings.
  • Established monitoring dashboards with Splunk and AWS CloudWatch to ensure system health and performance.
  • Mentored junior engineers and promoted best practices in Kubernetes management and AWS technologies.
  • Developed standardized naming conventions and tagging protocols for AWS resources to enhance management and visibility.
  • Re-architected AWS environments to resolve CIDR conflicts, enabling multi-account, multi-region scalability.
  • Automated the deployment of serverless architectures for ETL processes, improving efficiency.
  • Integrated Prisma Cloud for vulnerability management across all AWS accounts.
  • Designed PCI-compliant zones to ensure successful PCI renewals.
  • Deployed Kubernetes clusters with integrated security and monitoring features.

AWS DevOps Engineer - Fannie Mae

Aug 2016 - Nov 2019

  • Provisined infrastructure as part of Enterprise Data Integration (EDI) Group in Release Management, Engineering & DevOps (R.E.D) team, managing environments for 30+ application teams with diverse tech stacks (Informatica, Ab Initio, Tibco, Netezza, Oracle, IDR, MDM).
  • Implemented AWS cloud environments utilizing S3, EBS, EMR (Hive, Glue), EC2, RedShift, CodePipeline, CodeDeploy, and RDS, while configuring automated CICD pipelines integrated with CloudBees Jenkins, SVN, BitBucket, iCart, and Nexus Repository.
  • Established comprehensive monitoring solutions using Dynatrace for 250+ virtual and physical servers, implementing AutoSys for automated health checks of DB connections, EPV verification, Oracle roles, and disk utilization.
  • Implemented end-to-end disaster recovery solutions including Local Resiliency and Out-of-Region (OOR) recovery, ensuring business continuity through detailed failover procedures and documentation in SharePoint and Confluence.
  • Created and maintained critical infrastructure tools using Node.js, Bootstrap and HTML: AWS Cost Forecasting Tool, License Inventory Management system with DataTables, and Infrastructure Management Portal for comprehensive server/application inventory tracking.
  • Managed database operations including space-related issues for Oracle databases and filesystems (GPFS, NFS, Veritas) on SUSE Linux Servers, implemented Flyway deployment for Oracle and PostgreSQL databases with EPV/PAM integration, and successfully migrated to Delphix databases.
  • Created detailed architecture diagrams using HOPEX MEGA Tool and Microsoft Visio.
  • Executed successful migration from EPV to PAM across all applications.
  • Conducted proactive monitoring and capacity planning for Oracle databases, ensuring optimal performance through monthly release cycles while following ITIL processes for Incident and Change Management.

Teaching Assistant - University of Houston ClearLake

Aug 2015 - May 2016

  • Worked and developed different programs in C, VHDL, MatLab, for different software projects from end to end.
  • Provided support to students in understanding complex concepts and troubleshooting technical issues.
  • Contributed to the development of course materials and participated in grading assignments.

System Engineer - Tech Mahindra

June 2013 - Nov 2014

  • Supported Lotus notes to Microsoft Exchange migration, ensuring seamless data transfer across platforms.
  • Managed server infrastructures and ensured 24/7 uptime for critical systems.
  • Collaborated with cross-functional teams to resolve complex server and network issues efficiently.

Skills

Automation & Development

Python Ansible Shell Scripting

CI/CD & DevOps

Jenkins GitLab GitHub Actions AWS CodePipeline

Cloud Platforms

AWS GCP Azure IBM Cloud

Security & Compliance

PCI DSS ISO 27001 HIPAA GDPR SOC 2

Certifications

AWS Solutions Architect Professional
AWS Solutions Architect Professional
AWS DevOps Professional
AWS DevOps Engineer Professional
CKA
CKA
CKAD
CKAD
AWS SysOps Administrator Associate
AWS SysOps Admin Associate
AWS Solutions Architect Associate
AWS Solutions Architect Associate
AWS Developer Associate
AWS Developer Associate
AWS Cloud Practitioner
AWS Cloud Practitioner
HashiCorp Terraform Associate
Terraform Associate
AWS AI Practitioner
AWS AI Practitioner

Education

Master of Science in Computer Engineering

University of Houston, Clear Lake, TX

Bachelor of Technology in Electronics and Communication Engineering

JNTU, Hyderabad, India