~/ hasil@local
online

Observability is just paranoia with graphs

Hasil T · 2026-04-14 · 9 min read · programming

Every line of code I have ever written has, at some point, surprised me.

So I instrument everything.

Not because I think it will fail, but because the parts that “obviously cannot fail” are usually the parts that fail in ways that make you question whether computers are real.

This is not a treatise on OpenTelemetry. I am not going to explain distributed tracing theory using diagrams that look like subway maps designed by a sleep-deprived architect.

This is a field guide for the small project. The side project. The thing you built in a weekend and deployed for “just a few users,” which now somehow processes actual money and sends push notifications to people at 2AM.

The kind of system where your monitoring strategy is usually:

  • check logs
  • stare at logs
  • add more logs
  • become log

The first time I understood observability

Years ago, I had a tiny service that sent transactional emails.

Very simple architecture.

API receives request -> queue job -> worker sends email.

Three moving parts. Basically impossible to break.

Naturally, it broke in the dumbest way imaginable.

Emails would randomly stop sending for exactly 11 minutes.

Not 10. Not 12. Eleven.

Then everything resumed like nothing happened.

CPU fine. Memory fine. Database fine. Queue fine. Logs completely useless.

At one point I genuinely considered cosmic radiation.

After two days of debugging, the problem turned out to be a token refresh issue caused by clock drift inside a container.

Which sounds obvious after you know it.

Without instrumentation, debugging becomes archaeology. You are reconstructing a civilization from fragments of stdout.

”I’ll add monitoring later”

This is the biggest lie developers tell themselves after:

  • “I’ll clean this up later”
  • “temporary workaround”
  • “this should scale fine”
  • “Safari probably supports it”

The problem is observability feels unnecessary right until the exact moment you desperately need it.

And that moment usually happens:

  • during deployment
  • during sleep
  • during a demo
  • immediately after you told someone “it’s stable now”

Small systems especially create false confidence because everything feels local and understandable.

You think:

I know every part of this app.

Then suddenly:

  • one API route takes 14 seconds only in production
  • one customer uploads a file that destroys your thumbnail service
  • Redis reconnects itself into another dimension
  • a cron job runs twice because time itself is fake

And now you are SSH’ing into a server muttering things like:

why is nginx sweating

Logs are not observability

Logs are evidence.

Observability is context.

A log tells you:

payment failed

Observability tells you:

  • which customer
  • after which deployment
  • on which server
  • after what sequence of requests
  • with what latency spike
  • while what database query was dying internally

A surprising amount of backend engineering is just building systems that can explain themselves before you have to interrogate them manually.

Because the worst bugs are not crashes.

Crashes are merciful.

The worst bugs are:

  • partial failures
  • slowdowns
  • retries
  • race conditions
  • “works locally”
  • “only happens sometimes”
  • “cannot reproduce”

Which is software engineer language for:

may God help us all

Instrument the weird stuff

Everyone instruments API requests.

Few people instrument:

  • queue depth
  • retry counts
  • websocket reconnects
  • cache hit ratios
  • third-party latency
  • background jobs
  • external provider timeouts
  • how long PDF generation takes on the one VPS held together by optimism

The weird stuff is where the interesting failures live.

I once added tracing to a video processing pipeline mostly because I was bored.

Turns out one ffmpeg command occasionally took 40x longer depending on the input file metadata.

Not file size.

Metadata.

Computers are incredible machines built entirely out of betrayal.

My rule now

If something would be annoying to debug at 3AM, instrument it now.

That is the entire philosophy.

Not enterprise-grade observability. Not twelve dashboards. Not “AI-powered telemetry.”

Just:

future me does not deserve suffering

That means:

  • request IDs everywhere
  • timing external calls
  • structured logs
  • basic tracing
  • metrics for anything asynchronous
  • alerts for things that silently stop working

You do not need a NASA control room.

You just need enough information so that when production catches fire, you can at least identify which fire.

Small systems deserve observability too

People treat observability like it only matters at scale.

But honestly, tiny systems benefit the most.

Because in a small project:

  • there is no SRE team
  • there is no ops engineer
  • there is only you

You are developer, DevOps, QA, support, incident response, and occasionally therapist.

And when something breaks, you do not open a war room.

You open fifteen browser tabs and start guessing.

Good observability reduces guessing.

That is the real value.

Not pretty graphs. Not dashboards nobody checks. Not screenshots for architecture Twitter.

Just reducing the amount of time you spend staring into logs trying to determine whether the problem is:

  • your code
  • the database
  • Cloudflare
  • DNS
  • Docker
  • timezone handling
  • or the moon entering retrograde

Final thought

I used to think observability was something you added once a system became serious.

Now I think the opposite.

A system becomes serious the moment you cannot comfortably explain what it is doing.

And every system reaches that point much faster than you expect.

#observability#tooling