Skip to content

Alerting

When a check goes CRITICAL, someone needs to know — but not the wrong person, not too late, not 50 times in a row, not for every dependent symptom. That's what the alerting subsystem does.

flowchart LR
    R[Check result CRIT] --> RULE[Alert rule matches?]
    RULE -->|no| NOP[do nothing]
    RULE -->|yes| INHIB[Inhibition: parent CRIT?]
    INHIB -->|yes| MUTE[Suppress alert]
    INHIB -->|no| GROUP[Group]
    GROUP --> CHAN[Notification channels]
    CHAN --> EMAIL[Email]
    CHAN --> PUSH[Mobile push]
    CHAN --> WH[Webhook]
    CHAN --> SLACK[Slack/Teams]
    GROUP --> ESC[Escalation stage 1, 2, 3 ...]