Skip to content

SLA reports

How available was the service in a given period? How much downtime is attributable to it? How much was planned?

Definition

Availability (in %) = (total seconds − unavailable seconds) / total seconds × 100.

What counts as „unavailable":

  • Hard CRITICAL phases
  • Hard NO_DATA phases (default)
  • Hard WARNING (default not — configurable)

What does not count:

  • Planned downtimes (default — configurable to count with „strict")
  • ACK phases (still count as outage — ACK is just notification pause, not „world was OK")
  • Inhibition phases (count for the inhibited service because from customer view it was down; don't double-count for parent)

Generate report

/reports/slaNew report:

Field Meaning
Scope tenant / tag / hosts
Period last month / quarter / custom
Granularity per service / per host / per tenant
Downtime weighting „don't count" (default) / „strict"
WARN as outage? default no

Output: table with availability per row, plus expandable details with all incidents.

Example output

Tenant Acme — April 2026
─────────────────────────────────────────────────────────────────
Service                       Available   Planned   Unplanned
api.acme.com / HTTP             99.92%    02:14h    00:32h
api.acme.com / TLS cert        100.00%    -         -
db01.acme.local / Postgres     100.00%    -         -
sw-core / IF Gi1/0/1            99.45%    -         03:58h
sw-core / IF Gi1/0/2           100.00%    -         -
─────────────────────────────────────────────────────────────────
Tenant total                    99.83%    02:14h    04:30h

Click on a row expands a detail list with start, end, duration, cause (CRIT reason, downtime comment).

Calculation

flowchart LR
    H[check_results hypertable] --> AGG[State buckets per service]
    AGG --> CALC[Seconds per status]
    DT[Downtimes] --> EXCL[Exclude planned seconds]
    CALC --> EXCL
    EXCL --> AVAIL[Availability %]

Implemented as a Python job (api/app/services/sla.py) using TimescaleDB window functions. Performant even for 90 days × 5 000 services (~3 s).

Export

Format For
HTML in browser Interactive view
PDF Send to customer — see PDF reports
CSV Process in Excel

Scheduled reports

/reports/scheduledNew:

  • Monthly SLA report per tenant
  • Auto-generated on the 1st at 06:00
  • Result as PDF to configured email addresses

Recommendation: monthly report per tenant → customer auto-receives compliance document.

Multiple SLA classes

When different services have different SLA expectations („API: 99.9 %, wiki: 99.0 %"), use tags:

  • Tag sla-tier-99-9 for API services
  • Tag sla-tier-99-0 for internal services

Filter reports by tag for separate availability views per tag.

SLA + anomaly

Anomaly events do not automatically count as outage — they're just hints. If an anomaly leads to a hard CRIT (after escalation or manual action), the CRIT counts, not the anomaly.

Permission

Permission Effect
sla.view View reports
sla.export Generate PDF / CSV
sla.schedule Create scheduled reports

Next