AI chat & analysis¶

Vesana has two AI features:

AI chat — open-ended IT questions, with RAG from the wiki and web-search fallback
AI service analysis — automatic diagnosis of a service problem with rich context

Both use the same pipeline and same provider.

Provider¶

Three options:

Provider	Privacy	Quality (subjective)	Cost
Ollama local	Data stays in stack	OK with `llama3.1:8b`	Hardware (GPU recommended)
Anthropic	Data goes to Anthropic API	very high (Claude)	API tokens
OpenAI-compatible external	depends on provider	depends on model	depends on provider

Setup: AI provider setup.

Embeddings need a separate model

Anthropic has no embedding API. If you use Anthropic, you also need Ollama for embeddings (nomic-embed-text). Embedding path is configured separately in provider setup.

Search pipeline¶

Both chat and service analysis go through the same 4 source paths:

flowchart LR
    Q[Request] --> RAG[1. RAG: pgvector]
    RAG --> FTS_DE[2. FTS German]
    FTS_DE --> FTS_EN[3. FTS English]
    FTS_EN --> ILIKE[4. ILIKE fallback]
    ILIKE --> WEB[5. Web search DuckDuckGo]
    WEB --> CTX[Context for LLM]
    CTX --> LLM[Provider]
    LLM --> RESP[Answer + sources]

Step	What it does
RAG	Vector cosine similarity against wiki embeddings, top-N articles
FTS	PostgreSQL FTS German, then English
ILIKE	partial substring search as last wiki anchor
Web	DuckDuckGo HTML scraper (no API key), top-3 snippets

Per source the context is added until a token budget (~2000 chars) is filled.

AI chat¶

Floating widget bottom right (AiChatWidget).

Use¶

Type, hit enter
Answer streams in (Server-Sent Events)
Conversation history persists (last 10 messages)
New chat button clears history

Knowledge-source toggles¶

Per chat session:

📖 Wiki — RAG from your wiki
🌐 Internet — web search
💡 Own knowledge — without external source, LLM only

Default: all three on. For sensitive discussions: only wiki + own.

Source badges¶

The answer contains markers:

📖 cited wiki articles with click link
🌐 web sources with URL
💡 when LLM answers from own knowledge

So you see where the info came from.

Tenant scope¶

Search respects tenant scope:

Super admin: across all tenants
Tenant user: own tenant only

Server reads tenant_scope from JWT — nothing client-side manipulable.

AI service analysis¶

On error overview or host detail there's an Analyze button per service.

sequenceDiagram
    participant U as User
    participant API
    participant CTX as ai_context.py
    participant DB
    participant RAG
    participant LLM
    U->>API: POST /api/v1/ai/analyze/{service_id}
    API->>CTX: get_service_context()
    CTX->>DB: JOIN host_services + profile_checks + hosts + current_status + agent_tokens + collectors
    CTX-->>API: rich context
    API->>RAG: wiki search on service display name + check type
    API->>LLM: system prompt + context + RAG
    LLM-->>API: streaming answer with sources
    API-->>U: slide-in panel

Rich context¶

What the analysis gets as context:

Status (OK/WARN/CRIT/NO_DATA)
State type (SOFT/HARD), attempt X/Y
Status duration
Last value + perfdata
Agent / collector status (online/offline + version)
Acknowledged flag, downtime flag
Check interval, thresholds

So the LLM can answer substantively („disk at 96 % for 4 h, trend +0.3 %/h, parent switch is OK").

Cache¶

Analyses are stored in ai_analysis_cache keyed by status hash. Same status: same answer, no new LLM call. Status change: cache invalidated.

force=true query param forces re-analysis with higher temperature — useful when the first answer didn't help.

Privacy¶

What goes to the LLM?¶

Feature	What goes out
Chat (own knowledge)	only your question + conversation history
Chat (wiki on)	+ top wiki article excerpts
Chat (internet on)	+ web snippets (your question goes to the web search!)
Analysis	service context (hostname masked? No!) + wiki + web

Hostname and wiki content leave the server

With provider = Anthropic / OpenAI-compatible external, content goes to their API. Hostname is not masked. For privacy concerns, set provider to local Ollama — nothing leaves the stack.

What does NOT go to the LLM?¶

No passwords, tokens, API keys
No SNMP communities
No audit log entries
No data from other tenants
No user list
No SLA reports

Prompt-injection protection: system prompts are versioned code, not user-configurable.

Visibility¶

flowchart LR
    A[AI visible?]
    A --> S{Server enabled?}
    S -->|no| H1[Hidden]
    S -->|yes| P{Permission ai.query?}
    P -->|no| H2[Hidden]
    P -->|yes| U{User pref hide_ai?}
    U -->|true| H3[Hidden]
    U -->|false| Y[Visible]

Per user in Settings → Preferences the chat widget can be hidden, even when server and role allow it.

Implementation: useAiVisible() in frontend/src/hooks.ts combines server, permission, user pref.

Model choice¶

Use case	Recommendation
Fast, no GPU	Ollama `llama3.1:8b` (CPU-only OK with ~5 s response)
High-quality, local	Ollama `llama3.1:70b` with GPU
Top quality, paid	Anthropic `claude-sonnet-4-6`
Special tuning	OpenAI-compatible external with own fine-tuned model

For embeddings, always nomic-embed-text via Ollama — default and good performance/quality balance.

Permissions¶

Permission	Effect
`ai.query`	Open chat, send queries
`ai.analyze`	Start service analysis
`ai.config`	Configure provider and model (admin)

Next¶

AI provider setup — setup
Wiki — RAG source, the better, the better the answers