Skip to content

Downtimes

A downtime is a window during which a host or service is expected to fail — maintenance, reboot, migration. Alerts aren't sent during an active downtime.

Concept

[Live status]                    Service: CRITICAL
[during active downtime]         Status visible, no alert
[after downtime]                 If still CRIT → alert (or recovery)

Create a downtime

/downtimesNew or directly from the host context menu.

Field Meaning
Scope Host, service, or multiple via tag filter
Start Date + time (local time, not UTC!)
End Date + time or „duration"
Comment Required — what's happening, by whom
Recurring? Optional, then RRULE

datetime-local is local time

The frontend uses datetime-local HTML inputs in local time. When working via API, send ISO times in UTC with Z suffix — backend converts.

Recurring (RRULE)

For regular maintenance windows:

FREQ=WEEKLY;BYDAY=SU;BYHOUR=2;BYMINUTE=0
→ every Sunday 02:00
Field Example
Frequency DAILY, WEEKLY, MONTHLY
Weekdays MO, TU, …, SU
Day in month 1, 15, -1 (=last day)
Until until date X or N repetitions
Duration per occurrence e.g. 4h

UI generates the RRULE string; backend uses standard iCalendar RRULE.

Modal preview shows next 5 calculated occurrences.

Mobile quick-add

In the mobile app: host detail → Quick downtime → 30 min / 1 h / 4 h / 24 h with comment in one step.

Useful when on-call and need a quick break („coffee run, backup may be silent for 30 min").

Downtime lifecycle

stateDiagram-v2
    [*] --> Scheduled: created, start in future
    Scheduled --> Active: start time reached
    Active --> Expired: end time reached
    Active --> Cancelled: ended early
    Scheduled --> Cancelled: deleted before start
    Expired --> [*]
    Cancelled --> [*]

downtime_expiry_watcher (background job) cleans up expired downtimes — moves them from active to expired.

Effects

During active downtime:

  • Check runs as normal
  • Status visible in UI with maintenance badge
  • Alert rules ignore the service
  • Mobile push: not sent
  • Email notifications: not sent
  • Webhooks: not sent

In the audit log: starting and ending a downtime is recorded with author and comment.

SLA accounting

SLA reports exclude downtime windows by default — availability counts maintenance as „not relevant", not „downtime".

Configurable per report: „count downtimes as outage" for strict compliance view.

Details: SLA reports.

Bulk downtime

Tag-based: all hosts with production for 30 min.

curl -X POST -H "Authorization: Bearer <JWT>" \
  -d '{
    "scope": { "tag": "production" },
    "starts_at": "2026-04-25T22:00:00Z",
    "ends_at":   "2026-04-25T22:30:00Z",
    "comment":   "Reboot after kernel update"
  }' \
  https://your-domain.tld/api/v1/downtimes/

Backend resolves the tag filter to concrete hosts and creates one downtime per host.

End early

Active downtime → End. End time set to now, alerts fire again.

Useful when maintenance finishes early — you want to know if anything's still broken.

Audit

Every downtime action (create, change, end) is in the audit log with action = downtime.create / update / cancel and field diffs.

Next