Monitoring scripts¶

When built-in check types aren't enough — own backup tool, vendor-specific CLI, REST API to check — monitoring scripts are the path.

Concept¶

Scripts are managed centrally in Vesana (table monitoring_scripts), stored in the database, and delivered to agents via API. Nothing lives on the machine itself — the agent fetches the script content at runtime.

sequenceDiagram
    participant U as User
    participant API
    participant DB
    participant A as Agent

    U->>API: Create / edit script
    API->>DB: store
    A->>API: GET /agent/config (every 5 min)
    API->>DB: SELECT host_services + scripts
    API-->>A: { script_content, interpreter, expected_output }
    A->>A: start interpreter, pipe script via stdin
    A->>API: result (status + value + message + perfdata)

Builtin vs. custom¶

Type	`is_builtin`	`tenant_id`	Editable
Builtin	true	NULL	❌ (clone only)
Custom	false	tenant UUID	✅

39 builtin scripts ship today (system badge in frontend). Examples:

check_backup_log — Bash, parses typical backup logs
check_disk_smart — Bash, calls smartctl
check_iis_pool — PowerShell, application pool status
check_systemd_failed — Bash, lists failed units

Builtin scripts can be cloned — the copy lands as a custom script and is freely editable.

Create a script¶

/scripts → New:

Field	Required	Meaning
Name	✅	Unique per tenant
Description	–	What the script does
Interpreter	✅	`powershell`, `bash`, `python`
Script body	✅	Code, multi-line
Expected output	✅	`nagios`, `text`, `json`
Timeout (s)	✅	Default 30

Expected-output modes¶

`nagios`¶

Classic Nagios plugin format:

Exit code 0 → OK
Exit code 1 → WARNING
Exit code 2 → CRITICAL
Exit code 3 → UNKNOWN

Output format:

STATUS - description | label1=value;warn;crit label2=value

Example:

#!/bin/bash
LOAD=$(awk '{print $1}' /proc/loadavg)
if (( $(echo "$LOAD > 5" | bc -l) )); then
  echo "CRITICAL - Load $LOAD | load1=$LOAD;1;5"
  exit 2
elif (( $(echo "$LOAD > 1" | bc -l) )); then
  echo "WARNING - Load $LOAD | load1=$LOAD;1;5"
  exit 1
else
  echo "OK - Load $LOAD | load1=$LOAD;1;5"
  exit 0
fi

perfdata (after |) is automatically written as JSONB to check_results.perfdata.

`text`¶

Free-form text output, status only via exit code (0/1/2/3). Goes 1:1 into message, no perfdata.

#!/bin/bash
if [ -f /tmp/maintenance ]; then
  echo "Maintenance mode active"
  exit 1   # WARNING
fi
echo "Normal operation"
exit 0

`json`¶

Structured JSON output, status free:

import json, sys

result = {
  "status": "OK",      # OK / WARNING / CRITICAL / UNKNOWN
  "message": "Backup yesterday at 02:14",
  "value": 42,         # optional, numeric
  "perfdata": {        # optional
    "duration_s": 1834,
    "size_gb": 12.4
  }
}
print(json.dumps(result))

Advantage: no exit code hack, more readable, structured perfdata.

Bind script to a service¶

In the profile-check or directly on the host-service:

Field	Meaning
`check_type`	`agent_script`
`script_id`	Script UUID

On config refresh the agent fetches script content along with service config — no local storage.

Agent security¶

Scripts run as the user of the agent service. Linux: usually root (needed for many checks). Windows: LocalSystem (analogous).

Scripts run as root

Anyone who can create a custom script can execute root code on agent machines. Permission script.create should be restricted to tenant admins and above. Details: Roles & permissions.

Example: TLS expiry via script¶

#!/bin/bash
# check_cert_expiry.sh — argument: domain
DOMAIN="$1"
END=$(echo | openssl s_client -servername "$DOMAIN" -connect "$DOMAIN:443" 2>/dev/null | \
      openssl x509 -noout -enddate | cut -d= -f2)
END_EPOCH=$(date -d "$END" +%s)
NOW_EPOCH=$(date +%s)
DAYS=$(( (END_EPOCH - NOW_EPOCH) / 86400 ))

if [ "$DAYS" -lt 7 ]; then
  echo "CRITICAL - $DOMAIN expires in $DAYS days | days=$DAYS"
  exit 2
elif [ "$DAYS" -lt 21 ]; then
  echo "WARNING - $DOMAIN expires in $DAYS days | days=$DAYS"
  exit 1
fi
echo "OK - $DOMAIN valid for $DAYS more days | days=$DAYS"
exit 0

Save and configure as agent_script service with args: ["example.com"] (in check_config).

Clone a script¶

Cloning a builtin creates a custom copy:

/scripts → builtin entry → Clone → edit → save.

Delete a script¶

If services reference the script, the delete endpoint blocks with a list of references. Reassign first, then delete.

Audit¶

Script changes are logged with track_change — old and new values as JSONB diff. Visible in Admin → Audit log with filter target_kind = monitoring_script.

Next¶

Check type reference — how scripts hang on services
Roles & permissions — permission script.create