Skip to content

AI chat & analysis

Vesana has two AI features:

  1. AI chat โ€” open-ended IT questions, with RAG from the wiki and web-search fallback
  2. AI service analysis โ€” automatic diagnosis of a service problem with rich context

Both use the same pipeline and same provider.

Provider

Three options:

Provider Privacy Quality (subjective) Cost
Ollama local Data stays in stack OK with llama3.1:8b Hardware (GPU recommended)
Anthropic Data goes to Anthropic API very high (Claude) API tokens
OpenAI-compatible external depends on provider depends on model depends on provider

Setup: AI provider setup.

Embeddings need a separate model

Anthropic has no embedding API. If you use Anthropic, you also need Ollama for embeddings (nomic-embed-text). Embedding path is configured separately in provider setup.

Search pipeline

Both chat and service analysis go through the same 4 source paths:

flowchart LR
    Q[Request] --> RAG[1. RAG: pgvector]
    RAG --> FTS_DE[2. FTS German]
    FTS_DE --> FTS_EN[3. FTS English]
    FTS_EN --> ILIKE[4. ILIKE fallback]
    ILIKE --> WEB[5. Web search DuckDuckGo]
    WEB --> CTX[Context for LLM]
    CTX --> LLM[Provider]
    LLM --> RESP[Answer + sources]
Step What it does
RAG Vector cosine similarity against wiki embeddings, top-N articles
FTS PostgreSQL FTS German, then English
ILIKE partial substring search as last wiki anchor
Web DuckDuckGo HTML scraper (no API key), top-3 snippets

Per source the context is added until a token budget (~2000 chars) is filled.

AI chat

Floating widget bottom right (AiChatWidget).

Use

  • Type, hit enter
  • Answer streams in (Server-Sent Events)
  • Conversation history persists (last 10 messages)
  • New chat button clears history

Knowledge-source toggles

Per chat session:

  • ๐Ÿ“– Wiki โ€” RAG from your wiki
  • ๐ŸŒ Internet โ€” web search
  • ๐Ÿ’ก Own knowledge โ€” without external source, LLM only

Default: all three on. For sensitive discussions: only wiki + own.

Source badges

The answer contains markers:

  • ๐Ÿ“– cited wiki articles with click link
  • ๐ŸŒ web sources with URL
  • ๐Ÿ’ก when LLM answers from own knowledge

So you see where the info came from.

Tenant scope

Search respects tenant scope:

  • Super admin: across all tenants
  • Tenant user: own tenant only

Server reads tenant_scope from JWT โ€” nothing client-side manipulable.

AI service analysis

On error overview or host detail there's an Analyze button per service.

sequenceDiagram
    participant U as User
    participant API
    participant CTX as ai_context.py
    participant DB
    participant RAG
    participant LLM
    U->>API: POST /api/v1/ai/analyze/{service_id}
    API->>CTX: get_service_context()
    CTX->>DB: JOIN host_services + profile_checks + hosts + current_status + agent_tokens + collectors
    CTX-->>API: rich context
    API->>RAG: wiki search on service display name + check type
    API->>LLM: system prompt + context + RAG
    LLM-->>API: streaming answer with sources
    API-->>U: slide-in panel

Rich context

What the analysis gets as context:

  • Status (OK/WARN/CRIT/NO_DATA)
  • State type (SOFT/HARD), attempt X/Y
  • Status duration
  • Last value + perfdata
  • Agent / collector status (online/offline + version)
  • Acknowledged flag, downtime flag
  • Check interval, thresholds

So the LLM can answer substantively (โ€ždisk at 96 % for 4 h, trend +0.3 %/h, parent switch is OK").

Cache

Analyses are stored in ai_analysis_cache keyed by status hash. Same status: same answer, no new LLM call. Status change: cache invalidated.

force=true query param forces re-analysis with higher temperature โ€” useful when the first answer didn't help.

Privacy

What goes to the LLM?

Feature What goes out
Chat (own knowledge) only your question + conversation history
Chat (wiki on) + top wiki article excerpts
Chat (internet on) + web snippets (your question goes to the web search!)
Analysis service context (hostname masked? No!) + wiki + web

Hostname and wiki content leave the server

With provider = Anthropic / OpenAI-compatible external, content goes to their API. Hostname is not masked. For privacy concerns, set provider to local Ollama โ€” nothing leaves the stack.

What does NOT go to the LLM?

  • No passwords, tokens, API keys
  • No SNMP communities
  • No audit log entries
  • No data from other tenants
  • No user list
  • No SLA reports

Prompt-injection protection: system prompts are versioned code, not user-configurable.

Visibility

flowchart LR
    A[AI visible?]
    A --> S{Server enabled?}
    S -->|no| H1[Hidden]
    S -->|yes| P{Permission ai.query?}
    P -->|no| H2[Hidden]
    P -->|yes| U{User pref hide_ai?}
    U -->|true| H3[Hidden]
    U -->|false| Y[Visible]

Per user in Settings โ†’ Preferences the chat widget can be hidden, even when server and role allow it.

Implementation: useAiVisible() in frontend/src/hooks.ts combines server, permission, user pref.

Model choice

Use case Recommendation
Fast, no GPU Ollama llama3.1:8b (CPU-only OK with ~5 s response)
High-quality, local Ollama llama3.1:70b with GPU
Top quality, paid Anthropic claude-sonnet-4-6
Special tuning OpenAI-compatible external with own fine-tuned model

For embeddings, always nomic-embed-text via Ollama โ€” default and good performance/quality balance.

Permissions

Permission Effect
ai.query Open chat, send queries
ai.analyze Start service analysis
ai.config Configure provider and model (admin)

Next