Rolando Cruz

Senior Software Engineer
Dublin, Ireland hello@lando.ph LinkedIn

Summary

Software engineer with 14+ years of experience building and operating large-scale distributed systems. Currently at AWS Redshift, leading fleet-wide infrastructure initiatives across tens of thousands of production hosts — OS migrations, security hardening, and performance optimization. Track record of shipping solutions to ambiguous, cross-team problems with measurable impact, mentoring engineers to promotion, and driving production reliability across organizations.

Technical Skills

Languages: Python, Java, Bash
AWS: EC2, Secrets Manager, IAM, CloudWatch, Redshift
Databases: PostgreSQL, MySQL, Redis
Infrastructure: Terraform, Docker, Puppet, Ansible
Systems: Fleet orchestration, load balancing, sharding, replication strategies, blue-green deployments
Monitoring: Prometheus, ELK, CloudWatch

Experience

Amazon Web Services (AWS) Jun 2020 – Present
Software Development Engineer II / Tech Lead, OS Squad — Redshift
  • Led a fleet-wide OS migration across tens of thousands of live, stateful production hosts before an end-of-support deadline. Wrote the in-place OS replacement system — volume management, orchestration, and bootloader manipulation (grub-legacy to grub2) with rollback capability — when no upgrade path or cloud API existed. Built filesystem and bootloader abstraction layers for forward compatibility and implemented staged rollout across multiple regions.
  • Architected Redshift's Secrets Manager integration from concept to launch, eliminating plaintext credential handling for cluster admin passwords. Pushed back on the original design requiring each service to build its own rotation scheduler, proposing an event-based mechanism owned by Secrets Manager — adopted as the standard approach for all integrating services. After launch, instrumented failure-mode monitoring with differentiated error codes across all rotation paths, turning a single success metric into an operational dashboard that pinpointed exact failure causes.
  • Founded and led the OS squad responsible for the AMI build pipeline, security patching delivery, host bootstrapping, disk and memory setup, and bootloader management for kernel updates — deployed to 40+ regions including air-gapped environments under a 30-day CVE SLA. Reduced pipeline toil from a full-time job to a manageable rotation by implementing deployment alerting, onboarding to multi-stripe releases, and systematically eliminating test flakiness. Defined and maintained a 99.9%+ patching SLO and reduced monthly production incidents by over 85%.
  • Diagnosed and resolved a memory contention issue caused by competing resource requirements during host provisioning — database memory allocation vs. software installation workflows vs. swap constraints. Investigated fragmentation patterns, consulted cross-team SMEs, and designed a resource allocation strategy achieving up to 99% reduction in P90 startup latency on affected instance types.
  • Investigated production failures after kernel patches by tracing through cross-service logs outside the team's usual domain. Identified a hardware-level root cause and partnered with EC2 to expose a new health signal. Designed a deferred auto-remediation strategy that replaces at-risk hosts during maintenance windows, eliminating the class of incident.
  • Led cross-org privilege reduction on production hosts — permission management with backwards compatibility, incremental rollout, and safe deployment. Identified and remediated security vulnerabilities including credential exposure, authorization bypasses, and certificate validation gaps. Built AI-powered automation to triage out-of-compliance clusters, reducing weekly manual analysis from 20 tickets to 1–2. Authored operational runbooks for OS migrations and compliance used across on-call rotations.
  • Mentored 5+ engineers across levels — one mentee promoted to the next level, another successfully transitioned from infrastructure to software engineering roles. Served on multiple concurrent on-call rotations and recognized as go-to responder for cross-domain production incidents.
PayMaya Philippines Inc. Aug 2018 – Jun 2020
Engineering Manager — Wallet Platform Group
  • Managed part of the PayMaya Wallet platform group responsible for core banking and card management systems processing all issuing financial movement on the platform.
  • Designed and led a zero-downtime migration of the API serving layer from Apigee to API Gateway, including migrating the OAuth token provider, implementing backwards-compatible authentication across both gateways, and orchestrating DNS-based traffic cutover with no client disruption.
  • Led the migration of 80% of databases from Oracle to PostgreSQL, improving scalability and reducing licensing costs to support the platform's rapid user growth.
Freelancer Pty Ltd Oct 2012 – Aug 2018
Senior Software Engineer
  • Designed and implemented a multi-layer caching architecture to handle heavy bot and crawler traffic: database-backed responses for authenticated users with selective caching of static data, S3-stored pre-rendered content for unauthenticated users and bots, and Varnish at the load balancer layer for full HTML caching — routing different traffic types through different serving strategies to reduce database load.
  • Led development of the company's internationalization/translation service, providing language translation to multiple systems via an RPC interface using Apache Thrift.
  • Served as Interim Director of Engineering for 6 months, managing 40 engineers across multiple teams while continuing technical contributions.
Sourcefit Philippines Jul 2011 – Oct 2012
Web Developer
  • Built and maintained client websites from WordPress implementations to a custom-built CMS. Managed full server administration including Apache, MySQL, and DNS configuration.

Technical Leadership

Education

Self-taught Software Engineer — Studied Computer Science at University of the Philippines (2016–2018) to strengthen fundamentals before leaving to build a bank from scratch at PayMaya.