January 04, 2025 3 min read

Signal is Live

Alerting that doesn't cry wolf

My phone used to vibrate constantly.

Error rate above 1%? Buzz. Response time above 500ms? Buzz. CPU above 80%? Buzz. Disk usage above 70%? Buzz.

Eventually, I started ignoring them all.

That's alert fatigue. And it's dangerous. Because when everything is urgent, nothing is urgent.

The problem with traditional alerting

Every monitoring tool sends alerts. They're all dumb about it.

Static thresholds. Simple comparisons. No context.

"Error rate is 2%."

Cool. Is that bad? Was it 0.1% before? Did something just deploy? Are these real errors or test traffic? Who knows. The alert doesn't tell you.

So you investigate. Usually it's nothing. Eventually you stop investigating.

And then one day it's something real, and you miss it.

Signal thinks first

Signal doesn't just threshold and notify. It actually analyzes.

Correlation. Error spike and slow response times at the same moment? That's one incident, not two alerts. Signal correlates events across Recall, Reflex, and Pulse. Multiple symptoms, one notification.

Context. Instead of "Error rate is 2%", Signal tells you:

"Error rate jumped from 0.1% to 2% in the last 10 minutes. The errors are NoMethodError in OrdersController. They started after deploy #847. 234 users affected so far."

That's actionable. That tells you what to do.

Smart baselines. 500ms response time might be fine for your app. Or it might be terrible. Static thresholds are dumb.

Signal learns your baselines. Alerts when things deviate from your normal, not some arbitrary number.

The AI angle

Ask Claude: "What should I be worried about right now?"

Claude queries Signal. Reviews recent alerts. Checks current metrics. Gives you a situation report:

"One active incident: checkout conversion dropped 15% in the last hour. Likely cause: Stripe API timeouts (P95 response time 4.2 seconds, normally 200ms). Recommended action: check Stripe status page, consider failing over to secondary processor."

Not just alerts. Situation awareness.

Escalation that makes sense

Signal doesn't just blast your phone.

First alert: Slack notification
No response in 15 minutes: Email
No response in 30 minutes: PagerDuty
No response in 1 hour: Call the backup

Configurable paths. The right urgency for the right situation.

Night-time alerts for critical issues only. Business hours for everything else. You define what matters when.

How it integrates

Signal pulls from all Brainz Lab products:

alerts:
  - name: "High Error Rate"
    source: reflex
    condition: "error_rate > 1% for 5 minutes"
    severity: critical

  - name: "Slow Responses"
    source: pulse
    condition: "p95_response_time > 2s for 10 minutes"
    severity: warning

  - name: "Unusual Log Volume"
    source: recall
    condition: "log_volume > 3x baseline"
    severity: info

One place to manage all rules. One place to see all incidents.

The on-call experience

Signal isn't just about sending alerts. It's about making on-call bearable.

Incident timeline. Everything that happened, in order.

Runbooks. Linked documentation for common issues.

Quick actions. One-click acknowledge, resolve, snooze.

Handoff notes. Context for the next person on-call.

On-call doesn't have to mean anxiety. It can just be... manageable.

Try it

docker-compose up -d signal

# Open http://signal.localhost

Configure your first alert. Watch it work.

The stack is complete

Recall + Reflex + Pulse + Signal = Complete observability.

Four products that work together. One AI interface to query them all.

Ask Claude anything about your running application. Get answers. Fix problems. Sleep better at night.

This is what I've been building toward.

— Andres