Downtimes¶

A downtime is a window during which a host or service is expected to fail — maintenance, reboot, migration. Alerts aren't sent during an active downtime.

Concept¶

[Live status]                    Service: CRITICAL
[during active downtime]         Status visible, no alert
[after downtime]                 If still CRIT → alert (or recovery)

Create a downtime¶

/downtimes → New or directly from the host context menu.

Field	Meaning
Scope	Host, service, or multiple via tag filter
Start	Date + time (local time, not UTC!)
End	Date + time or „duration"
Comment	Required — what's happening, by whom
Recurring?	Optional, then RRULE

datetime-local is local time

The frontend uses datetime-local HTML inputs in local time. When working via API, send ISO times in UTC with Z suffix — backend converts.

Recurring (RRULE)¶

For regular maintenance windows:

FREQ=WEEKLY;BYDAY=SU;BYHOUR=2;BYMINUTE=0
→ every Sunday 02:00

Field	Example
Frequency	DAILY, WEEKLY, MONTHLY
Weekdays	MO, TU, …, SU
Day in month	1, 15, -1 (=last day)
Until	until date X or N repetitions
Duration per occurrence	e.g. 4h

UI generates the RRULE string; backend uses standard iCalendar RRULE.

Modal preview shows next 5 calculated occurrences.

Mobile quick-add¶

In the mobile app: host detail → Quick downtime → 30 min / 1 h / 4 h / 24 h with comment in one step.

Useful when on-call and need a quick break („coffee run, backup may be silent for 30 min").

Downtime lifecycle¶

stateDiagram-v2
    [*] --> Scheduled: created, start in future
    Scheduled --> Active: start time reached
    Active --> Expired: end time reached
    Active --> Cancelled: ended early
    Scheduled --> Cancelled: deleted before start
    Expired --> [*]
    Cancelled --> [*]

downtime_expiry_watcher (background job) cleans up expired downtimes — moves them from active to expired.

Effects¶

During active downtime:

Check runs as normal
Status visible in UI with maintenance badge
Alert rules ignore the service
Mobile push: not sent
Email notifications: not sent
Webhooks: not sent

In the audit log: starting and ending a downtime is recorded with author and comment.

SLA accounting¶

SLA reports exclude downtime windows by default — availability counts maintenance as „not relevant", not „downtime".

Configurable per report: „count downtimes as outage" for strict compliance view.

Details: SLA reports.

Bulk downtime¶

Tag-based: all hosts with production for 30 min.

curl -X POST -H "Authorization: Bearer <JWT>" \
  -d '{
    "scope": { "tag": "production" },
    "starts_at": "2026-04-25T22:00:00Z",
    "ends_at":   "2026-04-25T22:30:00Z",
    "comment":   "Reboot after kernel update"
  }' \
  https://your-domain.tld/api/v1/downtimes/

Backend resolves the tag filter to concrete hosts and creates one downtime per host.

End early¶

Active downtime → End. End time set to now, alerts fire again.

Useful when maintenance finishes early — you want to know if anything's still broken.

Audit¶

Every downtime action (create, change, end) is in the audit log with action = downtime.create / update / cancel and field diffs.

Next¶

Acknowledgements — distinction „seen" vs. „expected"
Dependencies & inhibition — alternative mute path for structural failures
SLA reports — how downtimes flow into availability