Site Reliability Engineer 5 — Candidate

PRASAD
MK

Live Streaming & Traffic at Scale  ·  25+ Years Production Systems  ·  Rocklin, CA

25+
Years in Production
8TB
Pipeline / Day
99.99
% SLA Owned
10K
Connections Scaled
TV-MA 25 Seasons  ·  Genre: Infrastructure Thriller  ·  Subtitles: Python, Go, Bash  ·  Available: Immediately
Scroll
Profile

WHO I AM

Infrastructure and reliability engineer with 25+ years owning large-scale production systems, including on-site at Apple, Samsung, Disney, FedEx, and PG&E.

Expert-level Linux and TCP/IP background: ported full network stacks from IPv4 to IPv6, traced packets in live production, tuned DNS, TLS, and HTTP-layer behavior under load.

Built Kafka and Pulsar pipelines at 8TB/day with sub-second latency. Experienced in load testing, fault injection, and driving observability end-to-end to solve thundering-herd and traffic-at-scale problems.

Comfortable rotating on-call, writing production code in Python and Go, and debugging complex failures across distributed microservice architectures.
🔴
$2.1M
Annual cost saved via K8s migration
45%
Incident resolution time reduced
📡
40%
Pipeline incidents cut via automation
🛡
100%
Fortune 500 escalations resolved
🏆
Global High Impact Award
HP — Recognized in India national press as globally rewarded engineer for Total Customer Experience excellence
50+
Microservices migrated to Kubernetes
70%
Infra setup time reduced via Terraform
95%
R&D roundtrips eliminated via tooling
10K
BBC concurrent connections scaled from 2K
Capabilities

TECHNICAL SKILLS

Languages
PythonGoJavaC/C++BashPerlSQLRust (familiar)
Streaming & Analytics
KafkaPulsarPresto/TrinoSpark SQLTime-series DBs8TB/day pipelines
Networking
TCP/IPDNSTLS/SSLHTTP(S)IPv4/IPv6L4/L7 Load BalancingReverse ProxyNginx
Reliability
Load TestingFault InjectionChaos EngineeringSLO/SLA OwnershipCapacity Planning
Observability
PrometheusGrafanaDatadogHoneycombOTELDistributed Tracing
Infrastructure
KubernetesDockerEKSOpenShiftTerraformHelmAWSGCPAzure
Services & OS
Linux (RHEL/Ubuntu/SUSE)MemcachedApacheAPI GatewayPacket TracingPerf Tuning
Databases
MySQLPostgreSQLDynamoDBOracleVerticaMSSQL
Career

EXPERIENCE

Solution Architect & Team Lead, IT Infrastructure Wipro Ltd. (Client: PNC Bank) Oct 2025 — Present

Technical lead for microservices architecture and production reliability across a 70,000-user enterprise environment on OpenShift and AWS.

  • Led development of REST APIs for the Alert Management System using Java and Spring Boot; designed API gateway routing, inter-service communication patterns, and end-to-end observability.
  • Executed load testing and capacity modeling for critical API traffic paths, identifying bottlenecks before production events and enabling proactive scaling.
  • Architected CI/CD pipelines with Jenkins and Terraform incorporating real-time security scanning; achieved 0% critical vulnerabilities in production.
  • Established runbook-as-code practices reducing escalation dependency for on-call engineers.
  • Building agentic AI tooling on AWS Bedrock; submitted Invention Disclosure Form for patent consideration.
Cloud Solution Architect / SRE / DevOps Engineer OpenText (ex Micro Focus) May 2018 — May 2025

Sole SRE owner for SaaS AIOps and Network Operations Manager across the Americas. On-call for all production incidents.

  • Maintained 99.99% SLA uptime as single-threaded owner; drove SOC2 compliance and owned DR strategy organization-wide.
  • Built and operated Kafka and Pulsar pipelines at 8TB/day with sub-second latency; used Presto/SQL for real-time analytics during incident response.
  • Designed and executed load testing and fault injection scenarios to validate system behavior under sudden traffic spikes.
  • Administered and monitored Nginx reverse proxies within OpsBridge and Optic Data Lake, including customer-facing health monitoring and HTTP cache layers.
  • Applied deep Linux internals knowledge for perf tuning and packet-level tracing; diagnosed DNS, TLS, and HTTP-layer failures, reducing resolution time by 45%.
  • Led migration of 50+ microservices to Kubernetes, reducing provisioning from days to 30 minutes and saving $2.1M annually.
Product Engineer / Software Designer Hewlett Packard / HPE Sep 2004 — Apr 2018

14 years spanning software engineering through on-site technical escalation at Apple, Samsung, Disney, FedEx, and PG&E.

  • Engineered and scaled the BBC (Backbone Communication Component), core async broker for Operations Agent, from 2,000 to 10,000 concurrent connections across all Unix and Windows platforms.
  • Implemented Reverse Channel Proxy for Operations Agent — enabling customers to connect to OpsBridge through a single outbound connection with no inbound firewall rules.
  • Administered and monitored Nginx deployments within OpsBridge and customer environments; diagnosed HTTP-layer and proxy configuration failures in live enterprise production.
  • Led IPv4-to-IPv6 migration of Operations Agent network stack with zero disruption across all supported enterprise platforms.
  • Engineered Embedded Perl memory caching enabling two years of uninterrupted operation in medical equipment and submarines.
  • Resolved 100% of on-site escalations at Fortune 500 accounts, securing multi-million-dollar contract renewals. Recipient of Global High Impact Award.
Software Engineer ProCSys (Client: ArcanaNetworks / Cisco) Sep 1999 — Aug 2004
  • Built OpsXML: intent-driven, declarative network device management framework for thousands of Cisco CCNA lab environments — one of the earliest implementations of infrastructure-as-code for physical networking.
  • Delivered SBA Teleworking deployment system automating secure remote access provisioning at enterprise scale.
Thought Leadership

PUBLICATIONS

SSRN Working Paper · 2025
"Taxing the AI Agents"
Engages directly with Anthropic's model spec and AI agent governance
VIEW PAPER →
Amazon Published Author
DAY ONE AI: From Your First Job to Your First Product
ISBN 979-8-9957306-0-6
VIEW ON AMAZON →
Medium Series
"The Escalation Trap" — 3-Part Series
@prasad.rocklin · Better Programming · dev.to communities
Open Source
AI Agent for Anki / Ankiweb
Targeting 10M+ user base · github.com/prasad-m-k
Let's Connect

GET IN TOUCH

Open to discussing the SRE 5 — Live SRE role at Netflix. Happy to walk through any of the live streaming reliability challenges in detail.

Rocklin, CA  ·  Available for remote roles