diff --git a/scheme.md b/scheme.md index fb3fce9..c8a4fdb 100644 --- a/scheme.md +++ b/scheme.md @@ -48,84 +48,72 @@ Il sistema è suddiviso in **3 flussi principali**: Schema semplificato per presentazioni executive e management. - - ```mermaid graph TB %% Styling classDef infrastructure fill:#e1f5ff,stroke:#01579b,stroke-width:3px,color:#333 + classDef kafka fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#333 classDef cache fill:#f3e5f5,stroke:#4a148c,stroke-width:3px,color:#333 - classDef change fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#333 classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px,color:#333 classDef git fill:#fce4ec,stroke:#880e4f,stroke-width:3px,color:#333 classDef human fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#333 - + %% ======================================== %% FLUSSO 1: RACCOLTA DATI (Background) %% ======================================== - + INFRA[("🏒 SISTEMI
INFRASTRUTTURALI

VMware | K8s | Linux | Cisco")]:::infrastructure - + CONN["πŸ”Œ CONNETTORI
Polling Automatico"]:::infrastructure - - REDIS[("πŸ’Ύ REDIS CACHE
Configurazione
Infrastruttura")]:::cache - + + KAFKA[("πŸ“¨ APACHE KAFKA
Message Broker
+ Persistenza")]:::kafka + + CONSUMER["βš™οΈ KAFKA CONSUMER
Processor Service"]:::kafka + + REDIS[("πŸ’Ύ REDIS CACHE
(Opzionale)
Performance Layer")]:::cache + INFRA -->|"API Polling
Continuo"| CONN - CONN -->|"Update
Configurazione"| REDIS - + CONN -->|"Publish
Eventi"| KAFKA + KAFKA -->|"Consume
Stream"| CONSUMER + CONSUMER -.->|"Update
Opzionale"| REDIS + %% ======================================== - %% CHANGE DETECTION + %% FLUSSO 2: GENERAZIONE DOCUMENTAZIONE %% ======================================== - - CHANGE["πŸ” CHANGE DETECTOR
Rileva Modifiche
Configurazione"]:::change - - REDIS -->|"Monitor
Changes"| CHANGE - - %% ======================================== - %% FLUSSO 2: GENERAZIONE DOCUMENTAZIONE (Triggered) - %% ======================================== - - TRIGGER["⚑ TRIGGER
Solo se modifiche"]:::change - - USER["πŸ‘€ UTENTE
Richiesta Manuale"]:::human - - LLM["πŸ€– LLM ENGINE
Qwen (Locale)"]:::llm - + + USER["πŸ‘€ UTENTE
Richiesta Doc"]:::human + + LLM["πŸ€– LLM ENGINE
Claude / GPT"]:::llm + MCP["πŸ”§ MCP SERVER
API Control Platform"]:::llm - + DOC["πŸ“„ DOCUMENTO
Markdown Generato"]:::llm - - CHANGE -->|"Modifiche
Rilevate"| TRIGGER - USER -.->|"Opzionale"| TRIGGER - - TRIGGER -->|"Avvia
Generazione"| LLM - LLM -->|"Tool Call"| MCP - MCP -->|"Query"| REDIS - REDIS -->|"Dati Config"| MCP - MCP -->|"Context"| LLM - LLM -->|"Genera"| DOC - + + USER -->|"1. Prompt"| LLM + LLM -->|"2. Tool Call"| MCP + MCP -->|"3a. Query"| KAFKA + MCP -.->|"3b. Query
Fast"| REDIS + KAFKA -->|"4a. Dati"| MCP + REDIS -.->|"4b. Dati"| MCP + MCP -->|"5. Context"| LLM + LLM -->|"6. Genera"| DOC + %% ======================================== %% FLUSSO 3: VALIDAZIONE E PUBBLICAZIONE %% ======================================== - + GIT["πŸ“¦ GITLAB
Repository"]:::git - + PR["πŸ”€ PULL REQUEST
Review Automatica"]:::git - + TECH["πŸ‘¨β€πŸ’Ό TEAM TECNICO
Validazione Umana"]:::human - + PIPELINE["⚑ CI/CD PIPELINE
GitLab Runner"]:::git - + MKDOCS["πŸ“š MKDOCS
Static Site Generator"]:::git - + WEB["🌐 DOCUMENTAZIONE
GitLab Pages
(Pubblicata)"]:::git - + DOC -->|"Push +
Branch"| GIT GIT -->|"Crea"| PR PR -->|"Notifica"| TECH @@ -133,18 +121,18 @@ graph TB GIT -->|"Trigger"| PIPELINE PIPELINE -->|"Build"| MKDOCS MKDOCS -->|"Deploy"| WEB - + %% ======================================== - %% ANNOTAZIONI + %% ANNOTAZIONI SICUREZZA %% ======================================== - + SECURITY["πŸ”’ SICUREZZA
LLM isolato dai sistemi live"]:::human - EFFICIENCY["⚑ EFFICIENZA
Doc generata solo
su modifiche"]:::change - + PERF["⚑ PERFORMANCE
Cache Redis opzionale"]:::cache + LLM -.->|"NESSUN
ACCESSO"| INFRA - + SECURITY -.-> LLM - EFFICIENCY -.-> CHANGE + PERF -.-> REDIS ``` --- @@ -155,229 +143,580 @@ graph TB Schema dettagliato per il team tecnico con specifiche implementative. - - ```mermaid graph TB %% Styling tecnico classDef infra fill:#e1f5ff,stroke:#01579b,stroke-width:2px,color:#333,font-size:11px classDef connector fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#333,font-size:11px + classDef kafka fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#333,font-size:11px classDef cache fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#333,font-size:11px - classDef change fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#333,font-size:11px classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#333,font-size:11px classDef git fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#333,font-size:11px classDef monitor fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px - + %% ===================================== %% LAYER 1: SISTEMI SORGENTE %% ===================================== - + subgraph SOURCES["🏒 INFRASTRUCTURE SOURCES"] VCENTER["VMware vCenter
API: vSphere REST 7.0+
Port: 443/HTTPS
Auth: API Token"]:::infra K8S_API["Kubernetes API
API: v1.28+
Port: 6443/HTTPS
Auth: ServiceAccount + RBAC"]:::infra LINUX["Linux Servers
Protocol: SSH/Ansible
Port: 22
Auth: SSH Keys"]:::infra CISCO["Cisco Devices
Protocol: NETCONF/RESTCONF
Port: 830/443
Auth: AAA"]:::infra end - + %% ===================================== %% LAYER 2: CONNETTORI %% ===================================== - + subgraph CONNECTORS["πŸ”Œ DATA COLLECTORS (Python/Go)"] - CONN_VM["VMware Collector
Lang: Python 3.11
Lib: pyvmomi
Schedule: */15 * * * *
Output: JSON β†’ Redis"]:::connector - + CONN_VM["VMware Collector
Lang: Python 3.11
Lib: pyvmomi
Schedule: */15 * * * *
Output: JSON"]:::connector + CONN_K8S["K8s Collector
Lang: Python 3.11
Lib: kubernetes-client
Schedule: */5 * * * *
Resources: pods,svc,ing,deploy"]:::connector - + CONN_LNX["Linux Collector
Lang: Python 3.11
Lib: paramiko/ansible
Schedule: */30 * * * *
Data: sysinfo,packages,services"]:::connector - + CONN_CSC["Cisco Collector
Lang: Python 3.11
Lib: ncclient
Schedule: */30 * * * *
Data: interfaces,routing,vlans"]:::connector end - + VCENTER -->|"GET /api/vcenter/vm"| CONN_VM K8S_API -->|"kubectl proxy
API calls"| CONN_K8S LINUX -->|"SSH batch
commands"| CONN_LNX CISCO -->|"NETCONF
get-config"| CONN_CSC - + %% ===================================== - %% LAYER 3: REDIS STORAGE + %% LAYER 3: MESSAGE BROKER %% ===================================== - - subgraph STORAGE["πŸ’Ύ REDIS CLUSTER"] + + subgraph MESSAGING["πŸ“¨ KAFKA CLUSTER (3 brokers)"] + KAFKA_TOPICS["Kafka Topics:
β€’ vmware.inventory (P:6, R:3)
β€’ k8s.resources (P:12, R:3)
β€’ linux.systems (P:3, R:3)
β€’ cisco.network (P:3, R:3)
Retention: 7 days
Format: JSON + Schema Registry"]:::kafka + + SCHEMA["Schema Registry
Avro Schemas
Versioning enabled
Port: 8081"]:::kafka + end + + CONN_VM -->|"Producer
Batch 100 msg"| KAFKA_TOPICS + CONN_K8S -->|"Producer
Batch 100 msg"| KAFKA_TOPICS + CONN_LNX -->|"Producer
Batch 50 msg"| KAFKA_TOPICS + CONN_CSC -->|"Producer
Batch 50 msg"| KAFKA_TOPICS + + KAFKA_TOPICS <--> SCHEMA + + %% ===================================== + %% LAYER 4: PROCESSING & CACHE + %% ===================================== + + subgraph PROCESSING["βš™οΈ STREAM PROCESSING"] + CONSUMER_GRP["Kafka Consumer Group
Group ID: doc-consumers
Lang: Python 3.11
Lib: kafka-python
Workers: 6
Commit: auto (5s)"]:::kafka + + PROCESSOR["Data Processor
β€’ Validation
β€’ Transformation
β€’ Enrichment
β€’ Deduplication"]:::kafka + end + + KAFKA_TOPICS -->|"Subscribe
offset management"| CONSUMER_GRP + CONSUMER_GRP --> PROCESSOR + + subgraph STORAGE["πŸ’Ύ CACHE LAYER (Optional)"] REDIS_CLUSTER["Redis Cluster
Mode: Cluster (6 nodes)
Port: 6379
Persistence: RDB + AOF
Memory: 64GB
Eviction: allkeys-lru"]:::cache - - REDIS_KEYS["Key Structure:
β€’ vmware:vcenter-id:vms:hash
β€’ k8s:cluster:namespace:resource:hash
β€’ linux:hostname:info:hash
β€’ cisco:device-id:config:hash
β€’ changelog:timestamp:diff
TTL: 30d for data, 90d for changelog"]:::cache + + REDIS_KEYS["Key Structure:
β€’ vmware:vcenter-id:vms
β€’ k8s:cluster:namespace:resource
β€’ linux:hostname:info
β€’ cisco:device-id:config
TTL: 1-24h based on type"]:::cache end - - CONN_VM -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER - CONN_K8S -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER - CONN_LNX -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER - CONN_CSC -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER - + + PROCESSOR -.->|"SET/HSET
Pipeline batch"| REDIS_CLUSTER REDIS_CLUSTER --> REDIS_KEYS - + %% ===================================== - %% LAYER 4: CHANGE DETECTION + %% LAYER 5: LLM & MCP %% ===================================== - - subgraph CHANGE_DETECTION["πŸ” CHANGE DETECTION SYSTEM"] - DETECTOR["Change Detector Service
Lang: Python 3.11
Lib: redis-py
Algorithm: Hash comparison
Check interval: */5 * * * *"]:::change - - DIFF_ENGINE["Diff Engine
β€’ Deep object comparison
β€’ JSON diff generation
β€’ Change classification
β€’ Severity assessment"]:::change - - CHANGE_LOG["Change Log Store
Key: changelog:*
Data: diff JSON + metadata
Indexed by: timestamp, resource"]:::change - - NOTIFIER["Change Notifier
β€’ Webhook triggers
β€’ Slack notifications
β€’ Event emission
Target: LLM trigger"]:::change - end - - REDIS_CLUSTER -->|"Monitor
key changes"| DETECTOR - DETECTOR --> DIFF_ENGINE - DIFF_ENGINE -->|"Store diff"| CHANGE_LOG - CHANGE_LOG --> REDIS_CLUSTER - DIFF_ENGINE -->|"Notify if
significant"| NOTIFIER - - %% ===================================== - %% LAYER 5: LLM TRIGGER & GENERATION - %% ===================================== - - subgraph TRIGGER_SYSTEM["⚑ TRIGGER SYSTEM"] - TRIGGER_SVC["Trigger Service
Lang: Python 3.11
Listen: Webhook + Redis Pub/Sub
Debounce: 5 min
Batch: multiple changes"]:::change - - QUEUE["Generation Queue
Type: Redis List
Priority: High/Medium/Low
Processing: FIFO"]:::change - end - - NOTIFIER -->|"Trigger event"| TRIGGER_SVC - TRIGGER_SVC -->|"Enqueue
generation task"| QUEUE - + subgraph LLM_LAYER["πŸ€– AI GENERATION LAYER"] - LLM_ENGINE["LLM Engine
Model: Qwen (Locale)
API: Ollama/vLLM/LM Studio
Port: 11434
Temp: 0.3
Max Tokens: 4096
Timeout: 120s"]:::llm - + LLM_ENGINE["LLM Engine
Model: Claude Sonnet 4 / GPT-4
API: Anthropic/OpenAI
Temp: 0.3
Max Tokens: 4096
Timeout: 120s"]:::llm + MCP_SERVER["MCP Server
Lang: TypeScript/Node.js
Port: 3000
Protocol: JSON-RPC 2.0
Auth: JWT tokens"]:::llm - - MCP_TOOLS["MCP Tools:
β€’ getVMwareInventory(vcenter)
β€’ getK8sResources(cluster,ns,type)
β€’ getLinuxSystemInfo(hostname)
β€’ getCiscoConfig(device,section)
β€’ getChangelog(start,end,resource)
Return: JSON + Metadata"]:::llm + + MCP_TOOLS["MCP Tools:
β€’ getVMwareInventory(vcenter)
β€’ getK8sResources(cluster,ns,type)
β€’ getLinuxSystemInfo(hostname)
β€’ getCiscoConfig(device,section)
β€’ queryTimeRange(start,end)
Return: JSON + Metadata"]:::llm end - - QUEUE -->|"Dequeue
task"| LLM_ENGINE - + LLM_ENGINE <-->|"Tool calls
JSON-RPC"| MCP_SERVER MCP_SERVER --> MCP_TOOLS - - MCP_TOOLS -->|"HGETALL/MGET
Read data"| REDIS_CLUSTER - REDIS_CLUSTER -->|"Config data
+ Changelog"| MCP_TOOLS + + MCP_TOOLS -->|"1. Query Kafka Consumer API
GET /api/v1/data"| CONSUMER_GRP + MCP_TOOLS -.->|"2. Fallback Redis
MGET/HGETALL"| REDIS_CLUSTER + + CONSUMER_GRP -->|"JSON Response
+ Timestamps"| MCP_TOOLS + REDIS_CLUSTER -.->|"Cached JSON
Fast response"| MCP_TOOLS + MCP_TOOLS -->|"Structured Data
+ Context"| LLM_ENGINE - + subgraph OUTPUT["πŸ“ DOCUMENT GENERATION"] TEMPLATE["Template Engine
Format: Jinja2
Templates: markdown/*.j2
Variables: from LLM"]:::llm - - MARKDOWN["Markdown Output
Format: CommonMark
Metadata: YAML frontmatter
Change summary included
Assets: diagrams in mermaid"]:::llm - - VALIDATOR["Doc Validator
β€’ Markdown linting
β€’ Link checking
β€’ Schema validation
β€’ Change verification"]:::llm + + MARKDOWN["Markdown Output
Format: CommonMark
Metadata: YAML frontmatter
Assets: diagrams in mermaid"]:::llm + + VALIDATOR["Doc Validator
β€’ Markdown linting
β€’ Link checking
β€’ Schema validation"]:::llm end - + LLM_ENGINE --> TEMPLATE TEMPLATE --> MARKDOWN MARKDOWN --> VALIDATOR - + %% ===================================== %% LAYER 6: GITOPS %% ===================================== - + subgraph GITOPS["πŸ”„ GITOPS WORKFLOW"] GIT_REPO["GitLab Repository
URL: gitlab.com/docs/infra
Branch strategy: main + feature/*
Protected: main (require approval)"]:::git - + GIT_API["GitLab API
API: v4
Auth: Project Access Token
Permissions: api, write_repo"]:::git - - PR_AUTO["Automated PR Creator
Lang: Python 3.11
Lib: python-gitlab
Template: .gitlab/merge_request.md
Include: change summary"]:::git + + PR_AUTO["Automated PR Creator
Lang: Python 3.11
Lib: python-gitlab
Template: .gitlab/merge_request.md"]:::git end - + VALIDATOR -->|"git add/commit/push"| GIT_REPO GIT_REPO <--> GIT_API GIT_API --> PR_AUTO - - REVIEWER["πŸ‘¨β€πŸ’Ό Technical Reviewer
Role: Maintainer/Owner
Review: diff + validation
Check: change correlation
Approve: required (min 1)"]:::monitor - + + REVIEWER["πŸ‘¨β€πŸ’Ό Technical Reviewer
Role: Maintainer/Owner
Review: diff + validation
Approve: required (min 1)"]:::monitor + PR_AUTO -->|"Notification
Email + Slack"| REVIEWER REVIEWER -->|"Merge to main"| GIT_REPO - + %% ===================================== %% LAYER 7: CI/CD & PUBLISH %% ===================================== - + subgraph CICD["⚑ CI/CD PIPELINE"] GITLAB_CI["GitLab CI/CD
Runner: docker
Image: python:3.11-alpine
Stages: build, test, deploy"]:::git - + PIPELINE_JOBS["Pipeline Jobs:
1. lint (markdownlint-cli)
2. build (mkdocs build)
3. test (link-checker)
4. deploy (rsync/s3)"]:::git - + MKDOCS_CFG["MkDocs Config
Theme: material
Plugins: search, tags, mermaid
Extensions: admonition, codehilite"]:::git end - + GIT_REPO -->|"on: push to main
Webhook trigger"| GITLAB_CI GITLAB_CI --> PIPELINE_JOBS PIPELINE_JOBS --> MKDOCS_CFG - + subgraph PUBLISH["🌐 PUBLICATION"] STATIC_SITE["Static Site
Generator: MkDocs
Output: HTML/CSS/JS
Assets: optimized images"]:::git - + CDN["GitLab Pages / S3 + CloudFront
URL: docs.company.com
SSL: Let's Encrypt
Cache: 1h"]:::git - + SEARCH["Search Index
Engine: Algolia/Meilisearch
Update: on publish
API: REST"]:::git end - + MKDOCS_CFG -->|"mkdocs build
--strict"| STATIC_SITE STATIC_SITE --> CDN STATIC_SITE --> SEARCH - + %% ===================================== %% LAYER 8: MONITORING & OBSERVABILITY %% ===================================== - + subgraph OBSERVABILITY["πŸ“Š MONITORING & LOGGING"] - PROMETHEUS["Prometheus
Metrics: collector updates, changes detected
Scrape: 30s
Retention: 15d"]:::monitor - - GRAFANA["Grafana Dashboards
β€’ Collector status
β€’ Redis performance
β€’ Change detection rate
β€’ LLM response times
β€’ Pipeline success rate"]:::monitor - + PROMETHEUS["Prometheus
Metrics: collector lag, cache hit/miss
Scrape: 30s
Retention: 15d"]:::monitor + + GRAFANA["Grafana Dashboards
β€’ Kafka metrics
β€’ Redis performance
β€’ LLM response times
β€’ Pipeline success rate"]:::monitor + ELK["ELK Stack
Logs: all components
Index: daily rotation
Retention: 30d"]:::monitor - - ALERTS["Alerting
β€’ Collector failures
β€’ Redis issues
β€’ Change detection errors
β€’ Pipeline failures
Channel: Slack + PagerDuty"]:::monitor + + ALERTS["Alerting
β€’ Connector failures
β€’ Kafka lag > 10k
β€’ Redis OOM
β€’ Pipeline failures
Channel: Slack + PagerDuty"]:::monitor end - + CONN_VM -.->|"metrics"| PROMETHEUS CONN_K8S -.->|"metrics"| PROMETHEUS + KAFKA_TOPICS -.->|"metrics"| PROMETHEUS REDIS_CLUSTER -.->|"metrics"| PROMETHEUS - DETECTOR -.->|"metrics"| PROMETHEUS MCP_SERVER -.->|"metrics"| PROMETHEUS GITLAB_CI -.->|"metrics"| PROMETHEUS - + PROMETHEUS --> GRAFANA - + CONN_VM -.->|"logs"| ELK - DETECTOR -.->|"logs"| ELK + CONSUMER_GRP -.->|"logs"| ELK MCP_SERVER -.->|"logs"| ELK GITLAB_CI -.->|"logs"| ELK - + GRAFANA --> ALERTS - + %% ===================================== - %% SECURITY & EFFICIENCY ANNOTATIONS + %% SECURITY ANNOTATIONS %% ===================================== - + SEC1["πŸ”’ SECURITY:
β€’ All APIs use TLS 1.3
β€’ Secrets in Vault/K8s Secrets
β€’ Network: private VPC
β€’ LLM has NO direct access"]:::monitor - + SEC2["πŸ” AUTHENTICATION:
β€’ API Tokens rotated 90d
β€’ RBAC enforced
β€’ Audit logs enabled
β€’ MFA required for Git"]:::monitor - - EFF1["⚑ EFFICIENCY:
β€’ Doc generation only on changes
β€’ Debounce prevents spam
β€’ Hash-based change detection
β€’ Batch processing"]:::change - + SEC1 -.-> MCP_SERVER SEC2 -.-> GIT_REPO - EFF1 -.-> DETECTOR ``` --- +## πŸ’¬ Sistema RAG Conversazionale + +### Interrogazione Documentazione con AI + +Sistema per "parlare" con la documentazione utilizzando Retrieval Augmented Generation (RAG). Permette agli utenti di porre domande in linguaggio naturale e ricevere risposte accurate basate sulla documentazione, con citazioni delle fonti. + +#### Caratteristiche Principali + +- βœ… **Semantic Search**: Ricerca vettoriale per comprendere l'intento della query +- βœ… **ScalabilitΓ **: Gestione di grandi volumi di documentazione (100k+ documenti) +- βœ… **Performance**: Risposte in <3 secondi con caching intelligente +- βœ… **Accuratezza**: Re-ranking e source attribution per risposte precise +- βœ… **LLM Locale**: Qwen on-premise per privacy e controllo + +### Schema RAG - Management View + +```mermaid +graph TB + %% Styling + classDef docs fill:#e3f2fd,stroke:#1565c0,stroke-width:3px,color:#333 + classDef process fill:#f3e5f5,stroke:#4a148c,stroke-width:3px,color:#333 + classDef vector fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#333 + classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px,color:#333 + classDef user fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#333 + classDef cache fill:#fce4ec,stroke:#880e4f,stroke-width:3px,color:#333 + + %% ======================================== + %% INGESTION PIPELINE (Offline) + %% ======================================== + + subgraph INGESTION["πŸ“š INGESTION PIPELINE (Offline Process)"] + DOCS["πŸ“„ DOCUMENTAZIONE
MkDocs Output
Markdown Files"]:::docs + + CHUNKER["βœ‚οΈ DOCUMENT CHUNKER
Split & Overlap
Metadata Extraction"]:::process + + EMBEDDER["🧠 EMBEDDING MODEL
Text β†’ Vectors
Dimensione: 768/1024"]:::process + + VECTORDB[("πŸ—„οΈ VECTOR DATABASE
Qdrant/Milvus
Sharded & Replicated")]:::vector + end + + DOCS -->|"Parse
Markdown"| CHUNKER + CHUNKER -->|"Text Chunks
+ Metadata"| EMBEDDER + EMBEDDER -->|"Store
Embeddings"| VECTORDB + + %% ======================================== + %% QUERY PIPELINE (Real-time) + %% ======================================== + + subgraph QUERY["πŸ’¬ QUERY PIPELINE (Real-time)"] + USER["πŸ‘€ UTENTE
Domanda/Query"]:::user + + QUERY_EMBED["🧠 QUERY EMBEDDING
Query β†’ Vector"]:::process + + SEARCH["πŸ” SEMANTIC SEARCH
Vector Similarity
Top-K Results"]:::vector + + RERANK["πŸ“Š RE-RANKING
Context Scoring
Relevance Filter"]:::process + + CONTEXT["πŸ“‹ CONTEXT BUILDER
Assemble Chunks
Add Metadata"]:::process + end + + USER -->|"Natural Language
Question"| QUERY_EMBED + QUERY_EMBED -->|"Query Vector"| SEARCH + SEARCH -->|"Search"| VECTORDB + VECTORDB -->|"Top-K Chunks
+ Scores"| SEARCH + SEARCH -->|"Initial Results"| RERANK + RERANK -->|"Filtered
Chunks"| CONTEXT + + %% ======================================== + %% GENERATION (LLM) + %% ======================================== + + subgraph GENERATION["πŸ€– ANSWER GENERATION"] + LLM_RAG["πŸ€– LLM ENGINE
Qwen (Locale)
+ RAG Context"]:::llm + + ANSWER["πŸ’‘ RISPOSTA
Generated Answer
+ Source Citations"]:::llm + end + + CONTEXT -->|"Context
+ Sources"| LLM_RAG + LLM_RAG -->|"Generate"| ANSWER + ANSWER -->|"Display"| USER + + %% ======================================== + %% CACHING & OPTIMIZATION + %% ======================================== + + CACHE[("πŸ’Ύ REDIS CACHE
Query Cache
Embedding Cache")]:::cache + + QUERY_EMBED -.->|"Check Cache"| CACHE + CACHE -.->|"Cached
Embedding"| SEARCH + + SEARCH -.->|"Cache
Results"| CACHE + + %% ======================================== + %% SCALING & UPDATE + %% ======================================== + + UPDATE["πŸ”„ INCREMENTAL UPDATE
On Doc Changes
Auto Re-index"]:::docs + + DOCS -.->|"Doc Updated"| UPDATE + UPDATE -.->|"Re-process
Changed Docs"| CHUNKER + + %% ======================================== + %% ANNOTATIONS + %% ======================================== + + SCALE["πŸ“ˆ SCALABILITΓ€
β€’ Vector DB sharding
β€’ Horizontal scaling
β€’ Load balancing"]:::vector + + PERF["⚑ PERFORMANCE
β€’ Query cache
β€’ Embedding cache
β€’ Async processing"]:::cache + + QUALITY["βœ… QUALITY
β€’ Re-ranking
β€’ Relevance scoring
β€’ Source citations"]:::process + + SCALE -.-> VECTORDB + PERF -.-> CACHE + QUALITY -.-> RERANK +``` + +### Schema RAG - Technical View + +```mermaid +graph TB + %% Styling + classDef docs fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#333,font-size:11px + classDef process fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#333,font-size:11px + classDef vector fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#333,font-size:11px + classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#333,font-size:11px + classDef user fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px + classDef cache fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#333,font-size:11px + classDef monitor fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px + + %% ===================================== + %% LAYER 1: DOCUMENTATION SOURCE + %% ===================================== + + subgraph DOCSOURCE["πŸ“š DOCUMENTATION SOURCE"] + MKDOCS_OUT["MkDocs Static Site
Path: /site/
Format: HTML + Markdown
Assets: images, diagrams
Update: on Git merge"]:::docs + + DOC_WATCHER["Document Watcher
Lang: Python 3.11
Lib: watchdog
Trigger: file system events
Debounce: 30s"]:::docs + + DOC_PARSER["Document Parser
HTML β†’ Plain Text
Preserve structure
Extract metadata
Clean formatting"]:::docs + end + + MKDOCS_OUT --> DOC_WATCHER + DOC_WATCHER -->|"New/Modified
Docs"| DOC_PARSER + + %% ===================================== + %% LAYER 2: CHUNKING STRATEGY + %% ===================================== + + subgraph CHUNKING["βœ‚οΈ INTELLIGENT CHUNKING"] + CHUNK_ENGINE["Chunking Engine
Lang: Python 3.11
Lib: langchain/llama-index
Strategy: Recursive Character"]:::process + + CHUNK_CONFIG["Chunking Config:
β€’ Chunk Size: 512 tokens
β€’ Overlap: 128 tokens
β€’ Separators: \\n\\n, \\n, . , ' '
β€’ Min chunk: 100 tokens
β€’ Max chunk: 1024 tokens"]:::process + + METADATA_EXTRACTOR["Metadata Extractor
Extract:
β€’ Document title
β€’ Section headers
β€’ Tags/keywords
β€’ Creation date
β€’ File path
β€’ Doc type"]:::process + end + + DOC_PARSER -->|"Parsed Text"| CHUNK_ENGINE + CHUNK_ENGINE --> CHUNK_CONFIG + CHUNK_ENGINE --> METADATA_EXTRACTOR + + %% ===================================== + %% LAYER 3: EMBEDDING GENERATION + %% ===================================== + + subgraph EMBEDDING["🧠 EMBEDDING GENERATION"] + EMBED_MODEL["Embedding Model
Model: all-MiniLM-L6-v2 / BGE-M3
Dim: 384/768/1024
API: sentence-transformers
Batch size: 32
GPU: CUDA acceleration"]:::process + + EMBED_CACHE["Embedding Cache
Type: Redis Hash
Key: hash(text)
TTL: 30d
Hit rate target: >80%"]:::cache + + EMBED_QUEUE["Processing Queue
Type: Redis List
Workers: 4-8
Rate: 100 chunks/s
Retry: 3 attempts"]:::process + end + + METADATA_EXTRACTOR -->|"Chunks
+ Metadata"| EMBED_QUEUE + EMBED_QUEUE --> EMBED_MODEL + EMBED_MODEL <-.->|"Cache
Check/Store"| EMBED_CACHE + + %% ===================================== + %% LAYER 4: VECTOR DATABASE + %% ===================================== + + subgraph VECTORDB["πŸ—„οΈ VECTOR DATABASE CLUSTER"] + QDRANT["Qdrant Cluster
Version: 1.7+
Nodes: 3-6 (replicated)
Shards: auto per collection
Port: 6333/6334"]:::vector + + COLLECTIONS["Collections:
β€’ docs_main (dim: 768)
β€’ docs_code (dim: 768)
β€’ docs_api (dim: 768)
Distance: Cosine
Index: HNSW (M=16, ef=100)"]:::vector + + SHARD_STRATEGY["Sharding Strategy:
β€’ Auto-sharding enabled
β€’ Shard size: 100k vectors
β€’ Replication factor: 2
β€’ Load balancing: Round-robin"]:::vector + end + + EMBED_MODEL -->|"Store
Vectors"| QDRANT + QDRANT --> COLLECTIONS + QDRANT --> SHARD_STRATEGY + + %% ===================================== + %% LAYER 5: QUERY PROCESSING + %% ===================================== + + subgraph QUERYPROC["πŸ’¬ QUERY PROCESSING PIPELINE"] + USER_INPUT["User Input
Interface: Web UI / API
Auth: JWT tokens
Rate limit: 20 req/min
Timeout: 30s"]:::user + + QUERY_PREPROCESS["Query Preprocessor
β€’ Spelling correction
β€’ Intent detection
β€’ Query expansion
β€’ Language detection"]:::process + + QUERY_EMBEDDER["Query Embedder
Same model as docs
Cache: Redis
Latency: <50ms"]:::process + + HYBRID_SEARCH["Hybrid Search
1. Vector search (semantic)
2. Keyword search (BM25)
3. Fusion: RRF algorithm
Top-K: 20 initial results"]:::vector + end + + USER_INPUT -->|"Natural
Language"| QUERY_PREPROCESS + QUERY_PREPROCESS --> QUERY_EMBEDDER + QUERY_EMBEDDER <-.->|"Cache"| EMBED_CACHE + QUERY_EMBEDDER -->|"Query
Vector"| HYBRID_SEARCH + HYBRID_SEARCH -->|"Search"| QDRANT + + %% ===================================== + %% LAYER 6: RE-RANKING & FILTERING + %% ===================================== + + subgraph RERANK["πŸ“Š RE-RANKING & FILTERING"] + RERANKER["Cross-Encoder Re-ranker
Model: ms-marco-MiniLM
Purpose: Fine-grained relevance
Process: Top-20 β†’ Top-5
Latency: 100-200ms"]:::process + + FILTER_ENGINE["Filter Engine
β€’ Relevance threshold: >0.7
β€’ Deduplication
β€’ Diversity scoring
β€’ Metadata filtering"]:::process + + CONTEXT_BUILDER["Context Builder
β€’ Assemble top chunks
β€’ Add source citations
β€’ Format for LLM
β€’ Max context: 4k tokens"]:::process + end + + QDRANT -->|"Top-K
Results"| RERANKER + RERANKER --> FILTER_ENGINE + FILTER_ENGINE --> CONTEXT_BUILDER + + %% ===================================== + %% LAYER 7: LLM GENERATION + %% ===================================== + + subgraph LLMGEN["πŸ€– LLM ANSWER GENERATION"] + RAG_PROMPT["RAG Prompt Template
Structure:
β€’ System: You are a helpful assistant
β€’ Context: Retrieved chunks
β€’ Question: User query
β€’ Instruction: Answer using context"]:::llm + + LLM_ENGINE["LLM Engine
Model: Qwen 2.5 (14B/32B)
API: Ollama/vLLM
Port: 11434
Temp: 0.2 (factual)
Max tokens: 2048
Stream: enabled"]:::llm + + ANSWER_POST["Answer Post-processor
β€’ Citation formatting
β€’ Source links
β€’ Confidence scoring
β€’ Fallback handling"]:::llm + end + + CONTEXT_BUILDER -->|"Context
+ Sources"| RAG_PROMPT + QUERY_PREPROCESS -->|"Original
Question"| RAG_PROMPT + RAG_PROMPT --> LLM_ENGINE + LLM_ENGINE --> ANSWER_POST + ANSWER_POST -->|"Final
Answer"| USER_INPUT + + %% ===================================== + %% LAYER 8: CACHING LAYER + %% ===================================== + + subgraph CACHING["πŸ’Ύ MULTI-LEVEL CACHE"] + REDIS_CACHE["Redis Cluster
Mode: Cluster
Nodes: 3
Memory: 16GB
Persistence: AOF"]:::cache + + CACHE_TYPES["Cache Types:
β€’ Query embeddings (TTL: 7d)
β€’ Search results (TTL: 1h)
β€’ LLM responses (TTL: 24h)
β€’ Popular queries (no TTL)
Eviction: LRU"]:::cache + + CACHE_WARMING["Cache Warming
Pre-compute:
β€’ Top 100 queries
β€’ Common patterns
Schedule: daily
Update: on doc changes"]:::cache + end + + REDIS_CACHE --> CACHE_TYPES + CACHE_TYPES --> CACHE_WARMING + + QUERY_EMBEDDER <-.-> REDIS_CACHE + HYBRID_SEARCH <-.-> REDIS_CACHE + LLM_ENGINE <-.-> REDIS_CACHE + + %% ===================================== + %% LAYER 9: SCALING & LOAD BALANCING + %% ===================================== + + subgraph SCALING["πŸ“ˆ SCALING INFRASTRUCTURE"] + LOAD_BALANCER["Load Balancer
Type: Nginx / HAProxy
Algorithm: Least connections
Health checks: /health
Timeout: 30s"]:::monitor + + QUERY_API["Query API Instances
Replicas: 3-10 (auto-scale)
Lang: FastAPI
Container: Docker
Orchestration: K8s"]:::user + + EMBED_WORKERS["Embedding Workers
Replicas: 4-8
GPU: Optional
Queue: Redis
Auto-scale: based on queue depth"]:::process + end + + LOAD_BALANCER --> QUERY_API + QUERY_API --> USER_INPUT + + %% ===================================== + %% LAYER 10: MONITORING & OBSERVABILITY + %% ===================================== + + subgraph MONITORING["πŸ“Š MONITORING & ANALYTICS"] + METRICS["Prometheus Metrics
β€’ Query latency (p50, p95, p99)
β€’ Vector search time
β€’ LLM response time
β€’ Cache hit rate
β€’ Embedding generation rate
Scrape: 15s"]:::monitor + + DASHBOARDS["Grafana Dashboards
β€’ RAG Performance
β€’ Query analytics
β€’ Resource utilization
β€’ Error tracking
Refresh: real-time"]:::monitor + + ANALYTICS["Query Analytics
Track:
β€’ Popular queries
β€’ Failed queries
β€’ Avg relevance scores
β€’ User satisfaction
Storage: TimescaleDB"]:::monitor + + ALERTS["Alerting Rules
β€’ Latency > 5s
β€’ Error rate > 5%
β€’ Cache hit < 70%
β€’ Vector DB down
Channel: Slack + Email"]:::monitor + end + + METRICS --> DASHBOARDS + DASHBOARDS --> ANALYTICS + ANALYTICS --> ALERTS + + QUERY_API -.->|"metrics"| METRICS + HYBRID_SEARCH -.->|"metrics"| METRICS + LLM_ENGINE -.->|"metrics"| METRICS + QDRANT -.->|"metrics"| METRICS + + %% ===================================== + %% LAYER 11: FEEDBACK LOOP + %% ===================================== + + subgraph FEEDBACK["πŸ”„ FEEDBACK & IMPROVEMENT"] + USER_FEEDBACK["User Feedback
β€’ Thumbs up/down
β€’ Relevance rating
β€’ Comments
Storage: PostgreSQL"]:::user + + FEEDBACK_ANALYSIS["Feedback Analysis
β€’ Identify bad answers
β€’ Track improvement areas
β€’ A/B testing results
Schedule: weekly"]:::monitor + + MODEL_TUNING["Model Fine-tuning
β€’ Re-rank model updates
β€’ Prompt optimization
β€’ Chunk size tuning
Cycle: monthly"]:::process + end + + USER_INPUT -->|"Rate
Answer"| USER_FEEDBACK + USER_FEEDBACK --> FEEDBACK_ANALYSIS + FEEDBACK_ANALYSIS --> MODEL_TUNING + MODEL_TUNING -.->|"Improve"| RERANKER + + %% ===================================== + %% ANNOTATIONS + %% ===================================== + + SCALE_NOTE["πŸ“ˆ SCALABILITY:
β€’ Vector DB: Horizontal sharding
β€’ API: K8s auto-scaling (HPA)
β€’ Workers: Queue-based scaling
β€’ Cache: Redis cluster
Target: 100k+ docs, 1k+ QPS"]:::monitor + + PERF_NOTE["⚑ PERFORMANCE TARGETS:
β€’ Query latency: <3s (p95)
β€’ Vector search: <100ms
β€’ LLM generation: <2s
β€’ Cache hit rate: >80%
β€’ Throughput: 1000 QPS"]:::cache + + QUALITY_NOTE["βœ… QUALITY ASSURANCE:
β€’ Re-ranking for precision
β€’ Source attribution
β€’ Confidence scoring
β€’ Fallback responses
β€’ Human feedback loop"]:::process + + SCALE_NOTE -.-> QDRANT + PERF_NOTE -.-> REDIS_CACHE + QUALITY_NOTE -.-> RERANKER +``` + +### Pipeline RAG + +**1. Ingestion Pipeline (Offline)** + +- Parsing documentazione MkDocs +- Chunking intelligente (512 token, overlap 128) +- Generazione embeddings (all-MiniLM-L6-v2) +- Storage in Vector Database (Qdrant cluster) + +**2. Query Pipeline (Real-time)** + +- Embedding della query utente +- Hybrid search (semantic + keyword) +- Re-ranking con cross-encoder +- Context assembly per LLM + +**3. Generation** + +- LLM locale (Qwen) con RAG context +- Source attribution automatica +- Streaming delle risposte + +**4. Scaling Strategy** + +- Vector DB sharding automatico +- API instances con auto-scaling K8s +- Redis cluster per caching multi-livello +- Load balancing con Nginx + +--- + ## πŸ“§ Contatti - **Team**: Infrastructure Documentation Team @@ -386,5 +725,5 @@ graph TB --- -**Versione**: 1.0.0 -**Ultimo aggiornamento**: 2025-10-28 \ No newline at end of file +**Versione**: 1.0.0 +**Ultimo aggiornamento**: 2025-10-28