Dependencies & inhibition¶
When the server room switch fails, 50 servers behind it are unreachable. Without inhibition you'd get 50 emails — but the actual problem is exactly one.
Dependencies model „A depends on B"; inhibition suppresses alerts on A while B itself is critical.
Model¶
graph TD
SW[sw-core: CRITICAL] -->|hangs on| WEB1[web01: CRIT inhibited]
SW -->|hangs on| WEB2[web02: CRIT inhibited]
SW -->|hangs on| DB[db01: CRIT inhibited]
SW -.continues to.-> ROUTER[router: OK]
DB -->|hangs on| APP[app-server: CRIT inhibited]
When sw-core is CRIT, all children are marked „inhibited". Their statuses appear with a gray badge in the UI, their alert rules don't fire.
Dependency model¶
In dependencies (migration 036):
| Field | Meaning |
|---|---|
parent_host_id |
parent host |
child_host_id |
dependent host |
parent_service_id |
optional: relationship at service level |
child_service_id |
optional |
inhibit_when_parent |
status from which inhibition kicks in (default CRITICAL) |
A relationship can be „only inhibit on parent CRIT" or „also on WARN".
Create¶
/dependencies shows a tree view. Per parent, + Add dependency:
- Pick child host
- Optional: specific service relationship instead of host level
- Set
inhibit_when_parent
Bulk: hang multiple children at once.
Inhibition logic in worker¶
flowchart LR
R[Check result CRIT] --> Q{Parents exist?}
Q -->|no| FORWARD[normal evaluation]
Q -->|yes| P{Parent status >= inhibit threshold?}
P -->|no| FORWARD
P -->|yes| INHIB[Suppress alert<br/>Status visible as inhibited]
Inhibition only applies to notifications — the status itself isn't masked. In the UI you see the service is critical, but inhibited because of sw-core.
Inhibition + recovery¶
When the parent recovers:
- Inhibition lifts
- If child is still CRIT, a notification is generated now — actual follow-up problems become visible
- If child also recovered through parent's restoration, all quiet (no alert spam)
Tree view¶
sw-core (CRITICAL)
├── web01 (CRIT, inhibited)
├── web02 (CRIT, inhibited)
└── db01 (CRIT, inhibited)
└── app-server (CRIT, inhibited transitively)
router-edge (OK)
└── (no dependencies)
Transitive inhibition: if db01 is inhibited (because sw-core is CRIT), and app-server depends on db01, then app-server is also inhibited.
Detection of parent hosts¶
Manual — no automatic detection. Recommendation:
- Discovery walks note the default gateway per host (when available) — that's the standard parent suggestion
- For storage volumes: storage host is the parent
- For VMs: VM host is the parent
Auto-population is on the roadmap — manually maintaining the most important parents is enough in practice.
Edge cases¶
Soft-deleted parent¶
Since v0.17.x
Inhibition ignores orphan / soft-deleted parent hosts — otherwise deleted switches would suppress follow-up alerts forever.
When the parent is deleted, inhibition lifts automatically — follow-up alerts fire again.
Parent in downtime¶
If the parent is in a downtime, it's not counted as CRIT — all children see it as „not inhibiting". Sensible: during planned maintenance on the switch, we don't ask „how many hosts hang on it" but watch them as normal.
Circular dependency¶
Backend rejects cycles (A → B → A) with 400 on creation.
Next¶
- Alert rules — how inhibition plays with rules
- Downtimes — alternative approach for „silent during maintenance"