How I Cut p95 Latency from 1.8s to 0.9s in PostgreSQL
A practical playbook using EXPLAIN ANALYZE, index redesign, autovacuum tuning, and lock analysis. Includes a repeatable checklist you can apply in production.
Deep dives on production databases: reliability, performance tuning, migration strategy, and real-world incident lessons from large-scale systems.
A practical playbook using EXPLAIN ANALYZE, index redesign, autovacuum tuning, and lock analysis. Includes a repeatable checklist you can apply in production.
Architecture patterns using AWS DMS + pglogical, cutover strategy, rollback plans, and migration validation patterns that protect SLA during high-risk windows.
Step-by-step disaster recovery testing approach to achieve near-zero RPO and low RTO with quarterly failover drills and automation-backed runbooks.
Real query pipeline tuning examples, compound indexing patterns, and hotspot elimination methods that reduced average query latency from 820ms to 450ms.
Alert quality design for database incidents, anomaly thresholds, and actionable triage data that cut mean time to resolution by 30% in production.
A battle-tested incident command model, communication templates, severity gates, and post-mortem framework that lowers repeat incidents and response fatigue.