Writing & Insights

Deep dives on production databases: reliability, performance tuning, migration strategy, and real-world incident lessons from large-scale systems.

PostgreSQL Jan 2026 · 12 min

How I Cut p95 Latency from 1.8s to 0.9s in PostgreSQL

A practical playbook using EXPLAIN ANALYZE, index redesign, autovacuum tuning, and lock analysis. Includes a repeatable checklist you can apply in production.

PostgreSQL Query Tuning p95 Latency
Migration Dec 2025 · 10 min

Zero-Downtime DB Migrations: 4TB On-Prem to AWS

Architecture patterns using AWS DMS + pglogical, cutover strategy, rollback plans, and migration validation patterns that protect SLA during high-risk windows.

AWS DMS pglogical Zero Downtime
Reliability Nov 2025 · 9 min

DR Readiness for Aurora Global Database

Step-by-step disaster recovery testing approach to achieve near-zero RPO and low RTO with quarterly failover drills and automation-backed runbooks.

Aurora RPO/RTO Runbooks
Performance Oct 2025 · 8 min

MongoDB Aggregation Tuning in Production

Real query pipeline tuning examples, compound indexing patterns, and hotspot elimination methods that reduced average query latency from 820ms to 450ms.

MongoDB Aggregation Indexing
Observability Sep 2025 · 11 min

Datadog + Grafana Alerting That Actually Reduces MTTR

Alert quality design for database incidents, anomaly thresholds, and actionable triage data that cut mean time to resolution by 30% in production.

Datadog Grafana MTTR
SRE Aug 2025 · 9 min

PostgreSQL On-Call Playbook for P1/P0 Incidents

A battle-tested incident command model, communication templates, severity gates, and post-mortem framework that lowers repeat incidents and response fatigue.

On-call PostgreSQL Incident Response