Monitoring scripts¶
When built-in check types aren't enough — own backup tool, vendor-specific CLI, REST API to check — monitoring scripts are the path.
Concept¶
Scripts are managed centrally in Vesana (table monitoring_scripts), stored in the database, and delivered to agents via API. Nothing lives on the machine itself — the agent fetches the script content at runtime.
sequenceDiagram
participant U as User
participant API
participant DB
participant A as Agent
U->>API: Create / edit script
API->>DB: store
A->>API: GET /agent/config (every 5 min)
API->>DB: SELECT host_services + scripts
API-->>A: { script_content, interpreter, expected_output }
A->>A: start interpreter, pipe script via stdin
A->>API: result (status + value + message + perfdata)
Builtin vs. custom¶
| Type | is_builtin |
tenant_id |
Editable |
|---|---|---|---|
| Builtin | true | NULL | ❌ (clone only) |
| Custom | false | tenant UUID | ✅ |
39 builtin scripts ship today (system badge in frontend). Examples:
check_backup_log— Bash, parses typical backup logscheck_disk_smart— Bash, callssmartctlcheck_iis_pool— PowerShell, application pool statuscheck_systemd_failed— Bash, lists failed units
Builtin scripts can be cloned — the copy lands as a custom script and is freely editable.
Create a script¶
/scripts → New:
| Field | Required | Meaning |
|---|---|---|
| Name | ✅ | Unique per tenant |
| Description | – | What the script does |
| Interpreter | ✅ | powershell, bash, python |
| Script body | ✅ | Code, multi-line |
| Expected output | ✅ | nagios, text, json |
| Timeout (s) | ✅ | Default 30 |
Expected-output modes¶
nagios¶
Classic Nagios plugin format:
- Exit code 0 → OK
- Exit code 1 → WARNING
- Exit code 2 → CRITICAL
- Exit code 3 → UNKNOWN
Output format:
Example:
#!/bin/bash
LOAD=$(awk '{print $1}' /proc/loadavg)
if (( $(echo "$LOAD > 5" | bc -l) )); then
echo "CRITICAL - Load $LOAD | load1=$LOAD;1;5"
exit 2
elif (( $(echo "$LOAD > 1" | bc -l) )); then
echo "WARNING - Load $LOAD | load1=$LOAD;1;5"
exit 1
else
echo "OK - Load $LOAD | load1=$LOAD;1;5"
exit 0
fi
perfdata (after |) is automatically written as JSONB to check_results.perfdata.
text¶
Free-form text output, status only via exit code (0/1/2/3). Goes 1:1 into message, no perfdata.
#!/bin/bash
if [ -f /tmp/maintenance ]; then
echo "Maintenance mode active"
exit 1 # WARNING
fi
echo "Normal operation"
exit 0
json¶
Structured JSON output, status free:
import json, sys
result = {
"status": "OK", # OK / WARNING / CRITICAL / UNKNOWN
"message": "Backup yesterday at 02:14",
"value": 42, # optional, numeric
"perfdata": { # optional
"duration_s": 1834,
"size_gb": 12.4
}
}
print(json.dumps(result))
Advantage: no exit code hack, more readable, structured perfdata.
Bind script to a service¶
In the profile-check or directly on the host-service:
| Field | Meaning |
|---|---|
check_type |
agent_script |
script_id |
Script UUID |
On config refresh the agent fetches script content along with service config — no local storage.
Agent security¶
Scripts run as the user of the agent service. Linux: usually root (needed for many checks). Windows: LocalSystem (analogous).
Scripts run as root
Anyone who can create a custom script can execute root code on agent machines. Permission script.create should be restricted to tenant admins and above. Details: Roles & permissions.
Example: TLS expiry via script¶
#!/bin/bash
# check_cert_expiry.sh — argument: domain
DOMAIN="$1"
END=$(echo | openssl s_client -servername "$DOMAIN" -connect "$DOMAIN:443" 2>/dev/null | \
openssl x509 -noout -enddate | cut -d= -f2)
END_EPOCH=$(date -d "$END" +%s)
NOW_EPOCH=$(date +%s)
DAYS=$(( (END_EPOCH - NOW_EPOCH) / 86400 ))
if [ "$DAYS" -lt 7 ]; then
echo "CRITICAL - $DOMAIN expires in $DAYS days | days=$DAYS"
exit 2
elif [ "$DAYS" -lt 21 ]; then
echo "WARNING - $DOMAIN expires in $DAYS days | days=$DAYS"
exit 1
fi
echo "OK - $DOMAIN valid for $DAYS more days | days=$DAYS"
exit 0
Save and configure as agent_script service with args: ["example.com"] (in check_config).
Clone a script¶
Cloning a builtin creates a custom copy:
/scripts → builtin entry → Clone → edit → save.
Delete a script¶
If services reference the script, the delete endpoint blocks with a list of references. Reassign first, then delete.
Audit¶
Script changes are logged with track_change — old and new values as JSONB diff. Visible in Admin → Audit log with filter target_kind = monitoring_script.
Next¶
- Check type reference — how scripts hang on services
- Roles & permissions — permission
script.create