AI chat & analysis¶
Vesana has two AI features:
- AI chat โ open-ended IT questions, with RAG from the wiki and web-search fallback
- AI service analysis โ automatic diagnosis of a service problem with rich context
Both use the same pipeline and same provider.
Provider¶
Three options:
| Provider | Privacy | Quality (subjective) | Cost |
|---|---|---|---|
| Ollama local | Data stays in stack | OK with llama3.1:8b |
Hardware (GPU recommended) |
| Anthropic | Data goes to Anthropic API | very high (Claude) | API tokens |
| OpenAI-compatible external | depends on provider | depends on model | depends on provider |
Setup: AI provider setup.
Embeddings need a separate model
Anthropic has no embedding API. If you use Anthropic, you also need Ollama for embeddings (nomic-embed-text). Embedding path is configured separately in provider setup.
Search pipeline¶
Both chat and service analysis go through the same 4 source paths:
flowchart LR
Q[Request] --> RAG[1. RAG: pgvector]
RAG --> FTS_DE[2. FTS German]
FTS_DE --> FTS_EN[3. FTS English]
FTS_EN --> ILIKE[4. ILIKE fallback]
ILIKE --> WEB[5. Web search DuckDuckGo]
WEB --> CTX[Context for LLM]
CTX --> LLM[Provider]
LLM --> RESP[Answer + sources]
| Step | What it does |
|---|---|
| RAG | Vector cosine similarity against wiki embeddings, top-N articles |
| FTS | PostgreSQL FTS German, then English |
| ILIKE | partial substring search as last wiki anchor |
| Web | DuckDuckGo HTML scraper (no API key), top-3 snippets |
Per source the context is added until a token budget (~2000 chars) is filled.
AI chat¶
Floating widget bottom right (AiChatWidget).
Use¶
- Type, hit enter
- Answer streams in (Server-Sent Events)
- Conversation history persists (last 10 messages)
- New chat button clears history
Knowledge-source toggles¶
Per chat session:
- ๐ Wiki โ RAG from your wiki
- ๐ Internet โ web search
- ๐ก Own knowledge โ without external source, LLM only
Default: all three on. For sensitive discussions: only wiki + own.
Source badges¶
The answer contains markers:
- ๐ cited wiki articles with click link
- ๐ web sources with URL
- ๐ก when LLM answers from own knowledge
So you see where the info came from.
Tenant scope¶
Search respects tenant scope:
- Super admin: across all tenants
- Tenant user: own tenant only
Server reads tenant_scope from JWT โ nothing client-side manipulable.
AI service analysis¶
On error overview or host detail there's an Analyze button per service.
sequenceDiagram
participant U as User
participant API
participant CTX as ai_context.py
participant DB
participant RAG
participant LLM
U->>API: POST /api/v1/ai/analyze/{service_id}
API->>CTX: get_service_context()
CTX->>DB: JOIN host_services + profile_checks + hosts + current_status + agent_tokens + collectors
CTX-->>API: rich context
API->>RAG: wiki search on service display name + check type
API->>LLM: system prompt + context + RAG
LLM-->>API: streaming answer with sources
API-->>U: slide-in panel
Rich context¶
What the analysis gets as context:
- Status (OK/WARN/CRIT/NO_DATA)
- State type (SOFT/HARD), attempt X/Y
- Status duration
- Last value + perfdata
- Agent / collector status (online/offline + version)
- Acknowledged flag, downtime flag
- Check interval, thresholds
So the LLM can answer substantively (โdisk at 96 % for 4 h, trend +0.3 %/h, parent switch is OK").
Cache¶
Analyses are stored in ai_analysis_cache keyed by status hash. Same status: same answer, no new LLM call. Status change: cache invalidated.
force=true query param forces re-analysis with higher temperature โ useful when the first answer didn't help.
Privacy¶
What goes to the LLM?¶
| Feature | What goes out |
|---|---|
| Chat (own knowledge) | only your question + conversation history |
| Chat (wiki on) | + top wiki article excerpts |
| Chat (internet on) | + web snippets (your question goes to the web search!) |
| Analysis | service context (hostname masked? No!) + wiki + web |
Hostname and wiki content leave the server
With provider = Anthropic / OpenAI-compatible external, content goes to their API. Hostname is not masked. For privacy concerns, set provider to local Ollama โ nothing leaves the stack.
What does NOT go to the LLM?¶
- No passwords, tokens, API keys
- No SNMP communities
- No audit log entries
- No data from other tenants
- No user list
- No SLA reports
Prompt-injection protection: system prompts are versioned code, not user-configurable.
Visibility¶
flowchart LR
A[AI visible?]
A --> S{Server enabled?}
S -->|no| H1[Hidden]
S -->|yes| P{Permission ai.query?}
P -->|no| H2[Hidden]
P -->|yes| U{User pref hide_ai?}
U -->|true| H3[Hidden]
U -->|false| Y[Visible]
Per user in Settings โ Preferences the chat widget can be hidden, even when server and role allow it.
Implementation: useAiVisible() in frontend/src/hooks.ts combines server, permission, user pref.
Model choice¶
| Use case | Recommendation |
|---|---|
| Fast, no GPU | Ollama llama3.1:8b (CPU-only OK with ~5 s response) |
| High-quality, local | Ollama llama3.1:70b with GPU |
| Top quality, paid | Anthropic claude-sonnet-4-6 |
| Special tuning | OpenAI-compatible external with own fine-tuned model |
For embeddings, always nomic-embed-text via Ollama โ default and good performance/quality balance.
Permissions¶
| Permission | Effect |
|---|---|
ai.query |
Open chat, send queries |
ai.analyze |
Start service analysis |
ai.config |
Configure provider and model (admin) |
Next¶
- AI provider setup โ setup
- Wiki โ RAG source, the better, the better the answers