Downtimes¶
A downtime is a window during which a host or service is expected to fail — maintenance, reboot, migration. Alerts aren't sent during an active downtime.
Concept¶
[Live status] Service: CRITICAL
[during active downtime] Status visible, no alert
[after downtime] If still CRIT → alert (or recovery)
Create a downtime¶
/downtimes → New or directly from the host context menu.
| Field | Meaning |
|---|---|
| Scope | Host, service, or multiple via tag filter |
| Start | Date + time (local time, not UTC!) |
| End | Date + time or „duration" |
| Comment | Required — what's happening, by whom |
| Recurring? | Optional, then RRULE |
datetime-local is local time
The frontend uses datetime-local HTML inputs in local time. When working via API, send ISO times in UTC with Z suffix — backend converts.
Recurring (RRULE)¶
For regular maintenance windows:
| Field | Example |
|---|---|
| Frequency | DAILY, WEEKLY, MONTHLY |
| Weekdays | MO, TU, …, SU |
| Day in month | 1, 15, -1 (=last day) |
| Until | until date X or N repetitions |
| Duration per occurrence | e.g. 4h |
UI generates the RRULE string; backend uses standard iCalendar RRULE.
Modal preview shows next 5 calculated occurrences.
Mobile quick-add¶
In the mobile app: host detail → Quick downtime → 30 min / 1 h / 4 h / 24 h with comment in one step.
Useful when on-call and need a quick break („coffee run, backup may be silent for 30 min").
Downtime lifecycle¶
stateDiagram-v2
[*] --> Scheduled: created, start in future
Scheduled --> Active: start time reached
Active --> Expired: end time reached
Active --> Cancelled: ended early
Scheduled --> Cancelled: deleted before start
Expired --> [*]
Cancelled --> [*]
downtime_expiry_watcher (background job) cleans up expired downtimes — moves them from active to expired.
Effects¶
During active downtime:
- Check runs as normal
- Status visible in UI with maintenance badge
- Alert rules ignore the service
- Mobile push: not sent
- Email notifications: not sent
- Webhooks: not sent
In the audit log: starting and ending a downtime is recorded with author and comment.
SLA accounting¶
SLA reports exclude downtime windows by default — availability counts maintenance as „not relevant", not „downtime".
Configurable per report: „count downtimes as outage" for strict compliance view.
Details: SLA reports.
Bulk downtime¶
Tag-based: all hosts with production for 30 min.
curl -X POST -H "Authorization: Bearer <JWT>" \
-d '{
"scope": { "tag": "production" },
"starts_at": "2026-04-25T22:00:00Z",
"ends_at": "2026-04-25T22:30:00Z",
"comment": "Reboot after kernel update"
}' \
https://your-domain.tld/api/v1/downtimes/
Backend resolves the tag filter to concrete hosts and creates one downtime per host.
End early¶
Active downtime → End. End time set to now, alerts fire again.
Useful when maintenance finishes early — you want to know if anything's still broken.
Audit¶
Every downtime action (create, change, end) is in the audit log with action = downtime.create / update / cancel and field diffs.
Next¶
- Acknowledgements — distinction „seen" vs. „expected"
- Dependencies & inhibition — alternative mute path for structural failures
- SLA reports — how downtimes flow into availability