EP 022

Why Your Infrastructure is Probably Broken

January 8, 2025 52 minutes
audio_player
show_notes

This week we’re talking about infrastructure monitoring and why most of it is fundamentally broken.

The Big Three Outages

We analyze three major outages from the past month and what they teach us about monitoring, alerting, and incident response.

Monitoring Anti-Patterns

  • Alert Spam: When everything is urgent, nothing is
  • Vanity Metrics: Measuring what looks good vs what matters
  • Dashboard Theater: Pretty charts that don’t help during incidents

What Actually Works

Real talk about monitoring approaches that have proven effective in production environments.

Actionable Advice

Practical steps you can take tomorrow to improve your monitoring and alerting setup.