Skip to content

Monitoring scripts

When built-in check types aren't enough — own backup tool, vendor-specific CLI, REST API to check — monitoring scripts are the path.

Concept

Scripts are managed centrally in Vesana (table monitoring_scripts), stored in the database, and delivered to agents via API. Nothing lives on the machine itself — the agent fetches the script content at runtime.

sequenceDiagram
    participant U as User
    participant API
    participant DB
    participant A as Agent

    U->>API: Create / edit script
    API->>DB: store
    A->>API: GET /agent/config (every 5 min)
    API->>DB: SELECT host_services + scripts
    API-->>A: { script_content, interpreter, expected_output }
    A->>A: start interpreter, pipe script via stdin
    A->>API: result (status + value + message + perfdata)

Builtin vs. custom

Type is_builtin tenant_id Editable
Builtin true NULL ❌ (clone only)
Custom false tenant UUID

39 builtin scripts ship today (system badge in frontend). Examples:

  • check_backup_log — Bash, parses typical backup logs
  • check_disk_smart — Bash, calls smartctl
  • check_iis_pool — PowerShell, application pool status
  • check_systemd_failed — Bash, lists failed units

Builtin scripts can be cloned — the copy lands as a custom script and is freely editable.

Create a script

/scriptsNew:

Field Required Meaning
Name Unique per tenant
Description What the script does
Interpreter powershell, bash, python
Script body Code, multi-line
Expected output nagios, text, json
Timeout (s) Default 30

Expected-output modes

nagios

Classic Nagios plugin format:

  • Exit code 0 → OK
  • Exit code 1 → WARNING
  • Exit code 2 → CRITICAL
  • Exit code 3 → UNKNOWN

Output format:

STATUS - description | label1=value;warn;crit label2=value

Example:

#!/bin/bash
LOAD=$(awk '{print $1}' /proc/loadavg)
if (( $(echo "$LOAD > 5" | bc -l) )); then
  echo "CRITICAL - Load $LOAD | load1=$LOAD;1;5"
  exit 2
elif (( $(echo "$LOAD > 1" | bc -l) )); then
  echo "WARNING - Load $LOAD | load1=$LOAD;1;5"
  exit 1
else
  echo "OK - Load $LOAD | load1=$LOAD;1;5"
  exit 0
fi

perfdata (after |) is automatically written as JSONB to check_results.perfdata.

text

Free-form text output, status only via exit code (0/1/2/3). Goes 1:1 into message, no perfdata.

#!/bin/bash
if [ -f /tmp/maintenance ]; then
  echo "Maintenance mode active"
  exit 1   # WARNING
fi
echo "Normal operation"
exit 0

json

Structured JSON output, status free:

import json, sys

result = {
  "status": "OK",      # OK / WARNING / CRITICAL / UNKNOWN
  "message": "Backup yesterday at 02:14",
  "value": 42,         # optional, numeric
  "perfdata": {        # optional
    "duration_s": 1834,
    "size_gb": 12.4
  }
}
print(json.dumps(result))

Advantage: no exit code hack, more readable, structured perfdata.

Bind script to a service

In the profile-check or directly on the host-service:

Field Meaning
check_type agent_script
script_id Script UUID

On config refresh the agent fetches script content along with service config — no local storage.

Agent security

Scripts run as the user of the agent service. Linux: usually root (needed for many checks). Windows: LocalSystem (analogous).

Scripts run as root

Anyone who can create a custom script can execute root code on agent machines. Permission script.create should be restricted to tenant admins and above. Details: Roles & permissions.

Example: TLS expiry via script

#!/bin/bash
# check_cert_expiry.sh — argument: domain
DOMAIN="$1"
END=$(echo | openssl s_client -servername "$DOMAIN" -connect "$DOMAIN:443" 2>/dev/null | \
      openssl x509 -noout -enddate | cut -d= -f2)
END_EPOCH=$(date -d "$END" +%s)
NOW_EPOCH=$(date +%s)
DAYS=$(( (END_EPOCH - NOW_EPOCH) / 86400 ))

if [ "$DAYS" -lt 7 ]; then
  echo "CRITICAL - $DOMAIN expires in $DAYS days | days=$DAYS"
  exit 2
elif [ "$DAYS" -lt 21 ]; then
  echo "WARNING - $DOMAIN expires in $DAYS days | days=$DAYS"
  exit 1
fi
echo "OK - $DOMAIN valid for $DAYS more days | days=$DAYS"
exit 0

Save and configure as agent_script service with args: ["example.com"] (in check_config).

Clone a script

Cloning a builtin creates a custom copy:

/scripts → builtin entry → Clone → edit → save.

Delete a script

If services reference the script, the delete endpoint blocks with a list of references. Reassign first, then delete.

Audit

Script changes are logged with track_change — old and new values as JSONB diff. Visible in Admin → Audit log with filter target_kind = monitoring_script.

Next