# 📚 Automated Infrastructure Documentation System Sistema automatizzato per la generazione e mantenimento della documentazione tecnica dell'infrastruttura aziendale tramite LLM locale con validazione umana e pubblicazione GitOps. [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![Redis](https://img.shields.io/badge/Redis-7.2+-red.svg)](https://redis.io/) ## 📋 Indice - [Overview](#overview) - [Architettura](#architettura) - [Schema Architetturale](#schema-architetturale) - [Schema Tecnico](#schema-tecnico) - [Contatti](#contatti) ## 🎯 Overview Sistema progettato per **automatizzare la creazione e l'aggiornamento della documentazione tecnica** di sistemi infrastrutturali complessi (VMware, Kubernetes, Linux, Cisco, ecc.) utilizzando un Large Language Model locale (Qwen). ### Caratteristiche Principali - ✅ **Raccolta dati asincrona** da molteplici sistemi infrastrutturali - ✅ **Isolamento di sicurezza**: LLM non accede mai ai sistemi live - ✅ **Change Detection**: Documentazione generata solo su modifiche rilevate - ✅ **Redis Cache** per storage dati e performance - ✅ **LLM locale on-premise** (Qwen) tramite MCP Server - ✅ **Human-in-the-loop validation** con workflow GitOps - ✅ **CI/CD automatizzato** per pubblicazione ## 🏗️ Architettura Il sistema è suddiviso in **3 flussi principali**: 1. **Raccolta Dati (Background)**: Connettori interrogano periodicamente i sistemi infrastrutturali tramite API e aggiornano Redis 2. **Change Detection**: Sistema di rilevamento modifiche che attiva la generazione documentazione solo quando necessario 3. **Generazione e Pubblicazione (Triggered)**: LLM locale (Qwen) genera markdown leggendo da Redis, seguito da review umana e deploy automatico > **Principio di Sicurezza**: L'LLM non ha mai accesso diretto ai sistemi infrastrutturali. Tutti i dati sono letti da Redis. > **Principio di Efficienza**: La documentazione viene generata solo quando il sistema rileva modifiche nella configurazione infrastrutturale. --- ## 📊 Schema Architetturale ### Management View Schema semplificato per presentazioni executive e management. ```mermaid graph TB %% Styling classDef infrastructure fill:#e1f5ff,stroke:#01579b,stroke-width:3px,color:#333 classDef cache fill:#f3e5f5,stroke:#4a148c,stroke-width:3px,color:#333 classDef change fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#333 classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px,color:#333 classDef git fill:#fce4ec,stroke:#880e4f,stroke-width:3px,color:#333 classDef human fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#333 %% ======================================== %% FLUSSO 1: RACCOLTA DATI (Background) %% ======================================== INFRA[("🏢 SISTEMI
INFRASTRUTTURALI

VMware | K8s | Linux | Cisco")]:::infrastructure CONN["🔌 CONNETTORI
Polling Automatico"]:::infrastructure REDIS[("💾 REDIS CACHE
Configurazione
Infrastruttura")]:::cache INFRA -->|"API Polling
Continuo"| CONN CONN -->|"Update
Configurazione"| REDIS %% ======================================== %% CHANGE DETECTION %% ======================================== CHANGE["🔍 CHANGE DETECTOR
Rileva Modifiche
Configurazione"]:::change REDIS -->|"Monitor
Changes"| CHANGE %% ======================================== %% FLUSSO 2: GENERAZIONE DOCUMENTAZIONE (Triggered) %% ======================================== TRIGGER["⚡ TRIGGER
Solo se modifiche"]:::change USER["👤 UTENTE
Richiesta Manuale"]:::human LLM["🤖 LLM ENGINE
Qwen (Locale)"]:::llm MCP["🔧 MCP SERVER
API Control Platform"]:::llm DOC["📄 DOCUMENTO
Markdown Generato"]:::llm CHANGE -->|"Modifiche
Rilevate"| TRIGGER USER -.->|"Opzionale"| TRIGGER TRIGGER -->|"Avvia
Generazione"| LLM LLM -->|"Tool Call"| MCP MCP -->|"Query"| REDIS REDIS -->|"Dati Config"| MCP MCP -->|"Context"| LLM LLM -->|"Genera"| DOC %% ======================================== %% FLUSSO 3: VALIDAZIONE E PUBBLICAZIONE %% ======================================== GIT["📦 GITLAB
Repository"]:::git PR["🔀 PULL REQUEST
Review Automatica"]:::git TECH["👨‍💼 TEAM TECNICO
Validazione Umana"]:::human PIPELINE["⚡ CI/CD PIPELINE
GitLab Runner"]:::git MKDOCS["📚 MKDOCS
Static Site Generator"]:::git WEB["🌐 DOCUMENTAZIONE
GitLab Pages
(Pubblicata)"]:::git DOC -->|"Push +
Branch"| GIT GIT -->|"Crea"| PR PR -->|"Notifica"| TECH TECH -->|"Approva +
Merge"| GIT GIT -->|"Trigger"| PIPELINE PIPELINE -->|"Build"| MKDOCS MKDOCS -->|"Deploy"| WEB %% ======================================== %% ANNOTAZIONI %% ======================================== SECURITY["🔒 SICUREZZA
LLM isolato dai sistemi live"]:::human EFFICIENCY["⚡ EFFICIENZA
Doc generata solo
su modifiche"]:::change LLM -.->|"NESSUN
ACCESSO"| INFRA SECURITY -.-> LLM EFFICIENCY -.-> CHANGE ``` --- ## 🔧 Schema Tecnico ### Implementation View Schema dettagliato per il team tecnico con specifiche implementative. ```mermaid graph TB %% Styling tecnico classDef infra fill:#e1f5ff,stroke:#01579b,stroke-width:2px,color:#333,font-size:11px classDef connector fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#333,font-size:11px classDef cache fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#333,font-size:11px classDef change fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#333,font-size:11px classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#333,font-size:11px classDef git fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#333,font-size:11px classDef monitor fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px %% ===================================== %% LAYER 1: SISTEMI SORGENTE %% ===================================== subgraph SOURCES["🏢 INFRASTRUCTURE SOURCES"] VCENTER["VMware vCenter
API: vSphere REST 7.0+
Port: 443/HTTPS
Auth: API Token"]:::infra K8S_API["Kubernetes API
API: v1.28+
Port: 6443/HTTPS
Auth: ServiceAccount + RBAC"]:::infra LINUX["Linux Servers
Protocol: SSH/Ansible
Port: 22
Auth: SSH Keys"]:::infra CISCO["Cisco Devices
Protocol: NETCONF/RESTCONF
Port: 830/443
Auth: AAA"]:::infra end %% ===================================== %% LAYER 2: CONNETTORI %% ===================================== subgraph CONNECTORS["🔌 DATA COLLECTORS (Python/Go)"] CONN_VM["VMware Collector
Lang: Python 3.11
Lib: pyvmomi
Schedule: */15 * * * *
Output: JSON → Redis"]:::connector CONN_K8S["K8s Collector
Lang: Python 3.11
Lib: kubernetes-client
Schedule: */5 * * * *
Resources: pods,svc,ing,deploy"]:::connector CONN_LNX["Linux Collector
Lang: Python 3.11
Lib: paramiko/ansible
Schedule: */30 * * * *
Data: sysinfo,packages,services"]:::connector CONN_CSC["Cisco Collector
Lang: Python 3.11
Lib: ncclient
Schedule: */30 * * * *
Data: interfaces,routing,vlans"]:::connector end VCENTER -->|"GET /api/vcenter/vm"| CONN_VM K8S_API -->|"kubectl proxy
API calls"| CONN_K8S LINUX -->|"SSH batch
commands"| CONN_LNX CISCO -->|"NETCONF
get-config"| CONN_CSC %% ===================================== %% LAYER 3: REDIS STORAGE %% ===================================== subgraph STORAGE["💾 REDIS CLUSTER"] REDIS_CLUSTER["Redis Cluster
Mode: Cluster (6 nodes)
Port: 6379
Persistence: RDB + AOF
Memory: 64GB
Eviction: allkeys-lru"]:::cache REDIS_KEYS["Key Structure:
• vmware:vcenter-id:vms:hash
• k8s:cluster:namespace:resource:hash
• linux:hostname:info:hash
• cisco:device-id:config:hash
• changelog:timestamp:diff
TTL: 30d for data, 90d for changelog"]:::cache end CONN_VM -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER CONN_K8S -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER CONN_LNX -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER CONN_CSC -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER REDIS_CLUSTER --> REDIS_KEYS %% ===================================== %% LAYER 4: CHANGE DETECTION %% ===================================== subgraph CHANGE_DETECTION["🔍 CHANGE DETECTION SYSTEM"] DETECTOR["Change Detector Service
Lang: Python 3.11
Lib: redis-py
Algorithm: Hash comparison
Check interval: */5 * * * *"]:::change DIFF_ENGINE["Diff Engine
• Deep object comparison
• JSON diff generation
• Change classification
• Severity assessment"]:::change CHANGE_LOG["Change Log Store
Key: changelog:*
Data: diff JSON + metadata
Indexed by: timestamp, resource"]:::change NOTIFIER["Change Notifier
• Webhook triggers
• Slack notifications
• Event emission
Target: LLM trigger"]:::change end REDIS_CLUSTER -->|"Monitor
key changes"| DETECTOR DETECTOR --> DIFF_ENGINE DIFF_ENGINE -->|"Store diff"| CHANGE_LOG CHANGE_LOG --> REDIS_CLUSTER DIFF_ENGINE -->|"Notify if
significant"| NOTIFIER %% ===================================== %% LAYER 5: LLM TRIGGER & GENERATION %% ===================================== subgraph TRIGGER_SYSTEM["⚡ TRIGGER SYSTEM"] TRIGGER_SVC["Trigger Service
Lang: Python 3.11
Listen: Webhook + Redis Pub/Sub
Debounce: 5 min
Batch: multiple changes"]:::change QUEUE["Generation Queue
Type: Redis List
Priority: High/Medium/Low
Processing: FIFO"]:::change end NOTIFIER -->|"Trigger event"| TRIGGER_SVC TRIGGER_SVC -->|"Enqueue
generation task"| QUEUE subgraph LLM_LAYER["🤖 AI GENERATION LAYER"] LLM_ENGINE["LLM Engine
Model: Qwen (Locale)
API: Ollama/vLLM/LM Studio
Port: 11434
Temp: 0.3
Max Tokens: 4096
Timeout: 120s"]:::llm MCP_SERVER["MCP Server
Lang: TypeScript/Node.js
Port: 3000
Protocol: JSON-RPC 2.0
Auth: JWT tokens"]:::llm MCP_TOOLS["MCP Tools:
• getVMwareInventory(vcenter)
• getK8sResources(cluster,ns,type)
• getLinuxSystemInfo(hostname)
• getCiscoConfig(device,section)
• getChangelog(start,end,resource)
Return: JSON + Metadata"]:::llm end QUEUE -->|"Dequeue
task"| LLM_ENGINE LLM_ENGINE <-->|"Tool calls
JSON-RPC"| MCP_SERVER MCP_SERVER --> MCP_TOOLS MCP_TOOLS -->|"HGETALL/MGET
Read data"| REDIS_CLUSTER REDIS_CLUSTER -->|"Config data
+ Changelog"| MCP_TOOLS MCP_TOOLS -->|"Structured Data
+ Context"| LLM_ENGINE subgraph OUTPUT["📝 DOCUMENT GENERATION"] TEMPLATE["Template Engine
Format: Jinja2
Templates: markdown/*.j2
Variables: from LLM"]:::llm MARKDOWN["Markdown Output
Format: CommonMark
Metadata: YAML frontmatter
Change summary included
Assets: diagrams in mermaid"]:::llm VALIDATOR["Doc Validator
• Markdown linting
• Link checking
• Schema validation
• Change verification"]:::llm end LLM_ENGINE --> TEMPLATE TEMPLATE --> MARKDOWN MARKDOWN --> VALIDATOR %% ===================================== %% LAYER 6: GITOPS %% ===================================== subgraph GITOPS["🔄 GITOPS WORKFLOW"] GIT_REPO["GitLab Repository
URL: gitlab.com/docs/infra
Branch strategy: main + feature/*
Protected: main (require approval)"]:::git GIT_API["GitLab API
API: v4
Auth: Project Access Token
Permissions: api, write_repo"]:::git PR_AUTO["Automated PR Creator
Lang: Python 3.11
Lib: python-gitlab
Template: .gitlab/merge_request.md
Include: change summary"]:::git end VALIDATOR -->|"git add/commit/push"| GIT_REPO GIT_REPO <--> GIT_API GIT_API --> PR_AUTO REVIEWER["👨‍💼 Technical Reviewer
Role: Maintainer/Owner
Review: diff + validation
Check: change correlation
Approve: required (min 1)"]:::monitor PR_AUTO -->|"Notification
Email + Slack"| REVIEWER REVIEWER -->|"Merge to main"| GIT_REPO %% ===================================== %% LAYER 7: CI/CD & PUBLISH %% ===================================== subgraph CICD["⚡ CI/CD PIPELINE"] GITLAB_CI["GitLab CI/CD
Runner: docker
Image: python:3.11-alpine
Stages: build, test, deploy"]:::git PIPELINE_JOBS["Pipeline Jobs:
1. lint (markdownlint-cli)
2. build (mkdocs build)
3. test (link-checker)
4. deploy (rsync/s3)"]:::git MKDOCS_CFG["MkDocs Config
Theme: material
Plugins: search, tags, mermaid
Extensions: admonition, codehilite"]:::git end GIT_REPO -->|"on: push to main
Webhook trigger"| GITLAB_CI GITLAB_CI --> PIPELINE_JOBS PIPELINE_JOBS --> MKDOCS_CFG subgraph PUBLISH["🌐 PUBLICATION"] STATIC_SITE["Static Site
Generator: MkDocs
Output: HTML/CSS/JS
Assets: optimized images"]:::git CDN["GitLab Pages / S3 + CloudFront
URL: docs.company.com
SSL: Let's Encrypt
Cache: 1h"]:::git SEARCH["Search Index
Engine: Algolia/Meilisearch
Update: on publish
API: REST"]:::git end MKDOCS_CFG -->|"mkdocs build
--strict"| STATIC_SITE STATIC_SITE --> CDN STATIC_SITE --> SEARCH %% ===================================== %% LAYER 8: MONITORING & OBSERVABILITY %% ===================================== subgraph OBSERVABILITY["📊 MONITORING & LOGGING"] PROMETHEUS["Prometheus
Metrics: collector updates, changes detected
Scrape: 30s
Retention: 15d"]:::monitor GRAFANA["Grafana Dashboards
• Collector status
• Redis performance
• Change detection rate
• LLM response times
• Pipeline success rate"]:::monitor ELK["ELK Stack
Logs: all components
Index: daily rotation
Retention: 30d"]:::monitor ALERTS["Alerting
• Collector failures
• Redis issues
• Change detection errors
• Pipeline failures
Channel: Slack + PagerDuty"]:::monitor end CONN_VM -.->|"metrics"| PROMETHEUS CONN_K8S -.->|"metrics"| PROMETHEUS REDIS_CLUSTER -.->|"metrics"| PROMETHEUS DETECTOR -.->|"metrics"| PROMETHEUS MCP_SERVER -.->|"metrics"| PROMETHEUS GITLAB_CI -.->|"metrics"| PROMETHEUS PROMETHEUS --> GRAFANA CONN_VM -.->|"logs"| ELK DETECTOR -.->|"logs"| ELK MCP_SERVER -.->|"logs"| ELK GITLAB_CI -.->|"logs"| ELK GRAFANA --> ALERTS %% ===================================== %% SECURITY & EFFICIENCY ANNOTATIONS %% ===================================== SEC1["🔒 SECURITY:
• All APIs use TLS 1.3
• Secrets in Vault/K8s Secrets
• Network: private VPC
• LLM has NO direct access"]:::monitor SEC2["🔐 AUTHENTICATION:
• API Tokens rotated 90d
• RBAC enforced
• Audit logs enabled
• MFA required for Git"]:::monitor EFF1["⚡ EFFICIENCY:
• Doc generation only on changes
• Debounce prevents spam
• Hash-based change detection
• Batch processing"]:::change SEC1 -.-> MCP_SERVER SEC2 -.-> GIT_REPO EFF1 -.-> DETECTOR ``` --- ## 💬 Sistema RAG Conversazionale ### Interrogazione Documentazione con AI Sistema per "parlare" con la documentazione utilizzando Retrieval Augmented Generation (RAG). Permette agli utenti di porre domande in linguaggio naturale e ricevere risposte accurate basate sulla documentazione, con citazioni delle fonti. #### Caratteristiche Principali - ✅ **Semantic Search**: Ricerca vettoriale per comprendere l'intento della query - ✅ **Scalabilità**: Gestione di grandi volumi di documentazione (100k+ documenti) - ✅ **Performance**: Risposte in <3 secondi con caching intelligente - ✅ **Accuratezza**: Re-ranking e source attribution per risposte precise - ✅ **LLM Locale**: Qwen on-premise per privacy e controllo ### Schema RAG - Management View ```mermaid graph TB %% Styling classDef docs fill:#e3f2fd,stroke:#1565c0,stroke-width:3px,color:#333 classDef process fill:#f3e5f5,stroke:#4a148c,stroke-width:3px,color:#333 classDef vector fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#333 classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px,color:#333 classDef user fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#333 classDef cache fill:#fce4ec,stroke:#880e4f,stroke-width:3px,color:#333 %% ======================================== %% INGESTION PIPELINE (Offline) %% ======================================== subgraph INGESTION["📚 INGESTION PIPELINE (Offline Process)"] DOCS["📄 DOCUMENTAZIONE
MkDocs Output
Markdown Files"]:::docs CHUNKER["✂️ DOCUMENT CHUNKER
Split & Overlap
Metadata Extraction"]:::process EMBEDDER["🧠 EMBEDDING MODEL
Text → Vectors
Dimensione: 768/1024"]:::process VECTORDB[("🗄️ VECTOR DATABASE
Qdrant/Milvus
Sharded & Replicated")]:::vector end DOCS -->|"Parse
Markdown"| CHUNKER CHUNKER -->|"Text Chunks
+ Metadata"| EMBEDDER EMBEDDER -->|"Store
Embeddings"| VECTORDB %% ======================================== %% QUERY PIPELINE (Real-time) %% ======================================== subgraph QUERY["💬 QUERY PIPELINE (Real-time)"] USER["👤 UTENTE
Domanda/Query"]:::user QUERY_EMBED["🧠 QUERY EMBEDDING
Query → Vector"]:::process SEARCH["🔍 SEMANTIC SEARCH
Vector Similarity
Top-K Results"]:::vector RERANK["📊 RE-RANKING
Context Scoring
Relevance Filter"]:::process CONTEXT["📋 CONTEXT BUILDER
Assemble Chunks
Add Metadata"]:::process end USER -->|"Natural Language
Question"| QUERY_EMBED QUERY_EMBED -->|"Query Vector"| SEARCH SEARCH -->|"Search"| VECTORDB VECTORDB -->|"Top-K Chunks
+ Scores"| SEARCH SEARCH -->|"Initial Results"| RERANK RERANK -->|"Filtered
Chunks"| CONTEXT %% ======================================== %% GENERATION (LLM) %% ======================================== subgraph GENERATION["🤖 ANSWER GENERATION"] LLM_RAG["🤖 LLM ENGINE
Qwen (Locale)
+ RAG Context"]:::llm ANSWER["💡 RISPOSTA
Generated Answer
+ Source Citations"]:::llm end CONTEXT -->|"Context
+ Sources"| LLM_RAG LLM_RAG -->|"Generate"| ANSWER ANSWER -->|"Display"| USER %% ======================================== %% CACHING & OPTIMIZATION %% ======================================== CACHE[("💾 REDIS CACHE
Query Cache
Embedding Cache")]:::cache QUERY_EMBED -.->|"Check Cache"| CACHE CACHE -.->|"Cached
Embedding"| SEARCH SEARCH -.->|"Cache
Results"| CACHE %% ======================================== %% SCALING & UPDATE %% ======================================== UPDATE["🔄 INCREMENTAL UPDATE
On Doc Changes
Auto Re-index"]:::docs DOCS -.->|"Doc Updated"| UPDATE UPDATE -.->|"Re-process
Changed Docs"| CHUNKER %% ======================================== %% ANNOTATIONS %% ======================================== SCALE["📈 SCALABILITÀ
• Vector DB sharding
• Horizontal scaling
• Load balancing"]:::vector PERF["⚡ PERFORMANCE
• Query cache
• Embedding cache
• Async processing"]:::cache QUALITY["✅ QUALITY
• Re-ranking
• Relevance scoring
• Source citations"]:::process SCALE -.-> VECTORDB PERF -.-> CACHE QUALITY -.-> RERANK ``` ### Schema RAG - Technical View ```mermaid graph TB %% Styling classDef docs fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#333,font-size:11px classDef process fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#333,font-size:11px classDef vector fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#333,font-size:11px classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#333,font-size:11px classDef user fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px classDef cache fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#333,font-size:11px classDef monitor fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px %% ===================================== %% LAYER 1: DOCUMENTATION SOURCE %% ===================================== subgraph DOCSOURCE["📚 DOCUMENTATION SOURCE"] MKDOCS_OUT["MkDocs Static Site
Path: /site/
Format: HTML + Markdown
Assets: images, diagrams
Update: on Git merge"]:::docs DOC_WATCHER["Document Watcher
Lang: Python 3.11
Lib: watchdog
Trigger: file system events
Debounce: 30s"]:::docs DOC_PARSER["Document Parser
HTML → Plain Text
Preserve structure
Extract metadata
Clean formatting"]:::docs end MKDOCS_OUT --> DOC_WATCHER DOC_WATCHER -->|"New/Modified
Docs"| DOC_PARSER %% ===================================== %% LAYER 2: CHUNKING STRATEGY %% ===================================== subgraph CHUNKING["✂️ INTELLIGENT CHUNKING"] CHUNK_ENGINE["Chunking Engine
Lang: Python 3.11
Lib: langchain/llama-index
Strategy: Recursive Character"]:::process CHUNK_CONFIG["Chunking Config:
• Chunk Size: 512 tokens
• Overlap: 128 tokens
• Separators: \\n\\n, \\n, . , ' '
• Min chunk: 100 tokens
• Max chunk: 1024 tokens"]:::process METADATA_EXTRACTOR["Metadata Extractor
Extract:
• Document title
• Section headers
• Tags/keywords
• Creation date
• File path
• Doc type"]:::process end DOC_PARSER -->|"Parsed Text"| CHUNK_ENGINE CHUNK_ENGINE --> CHUNK_CONFIG CHUNK_ENGINE --> METADATA_EXTRACTOR %% ===================================== %% LAYER 3: EMBEDDING GENERATION %% ===================================== subgraph EMBEDDING["🧠 EMBEDDING GENERATION"] EMBED_MODEL["Embedding Model
Model: all-MiniLM-L6-v2 / BGE-M3
Dim: 384/768/1024
API: sentence-transformers
Batch size: 32
GPU: CUDA acceleration"]:::process EMBED_CACHE["Embedding Cache
Type: Redis Hash
Key: hash(text)
TTL: 30d
Hit rate target: >80%"]:::cache EMBED_QUEUE["Processing Queue
Type: Redis List
Workers: 4-8
Rate: 100 chunks/s
Retry: 3 attempts"]:::process end METADATA_EXTRACTOR -->|"Chunks
+ Metadata"| EMBED_QUEUE EMBED_QUEUE --> EMBED_MODEL EMBED_MODEL <-.->|"Cache
Check/Store"| EMBED_CACHE %% ===================================== %% LAYER 4: VECTOR DATABASE %% ===================================== subgraph VECTORDB["🗄️ VECTOR DATABASE CLUSTER"] QDRANT["Qdrant Cluster
Version: 1.7+
Nodes: 3-6 (replicated)
Shards: auto per collection
Port: 6333/6334"]:::vector COLLECTIONS["Collections:
• docs_main (dim: 768)
• docs_code (dim: 768)
• docs_api (dim: 768)
Distance: Cosine
Index: HNSW (M=16, ef=100)"]:::vector SHARD_STRATEGY["Sharding Strategy:
• Auto-sharding enabled
• Shard size: 100k vectors
• Replication factor: 2
• Load balancing: Round-robin"]:::vector end EMBED_MODEL -->|"Store
Vectors"| QDRANT QDRANT --> COLLECTIONS QDRANT --> SHARD_STRATEGY %% ===================================== %% LAYER 5: QUERY PROCESSING %% ===================================== subgraph QUERYPROC["💬 QUERY PROCESSING PIPELINE"] USER_INPUT["User Input
Interface: Web UI / API
Auth: JWT tokens
Rate limit: 20 req/min
Timeout: 30s"]:::user QUERY_PREPROCESS["Query Preprocessor
• Spelling correction
• Intent detection
• Query expansion
• Language detection"]:::process QUERY_EMBEDDER["Query Embedder
Same model as docs
Cache: Redis
Latency: <50ms"]:::process HYBRID_SEARCH["Hybrid Search
1. Vector search (semantic)
2. Keyword search (BM25)
3. Fusion: RRF algorithm
Top-K: 20 initial results"]:::vector end USER_INPUT -->|"Natural
Language"| QUERY_PREPROCESS QUERY_PREPROCESS --> QUERY_EMBEDDER QUERY_EMBEDDER <-.->|"Cache"| EMBED_CACHE QUERY_EMBEDDER -->|"Query
Vector"| HYBRID_SEARCH HYBRID_SEARCH -->|"Search"| QDRANT %% ===================================== %% LAYER 6: RE-RANKING & FILTERING %% ===================================== subgraph RERANK["📊 RE-RANKING & FILTERING"] RERANKER["Cross-Encoder Re-ranker
Model: ms-marco-MiniLM
Purpose: Fine-grained relevance
Process: Top-20 → Top-5
Latency: 100-200ms"]:::process FILTER_ENGINE["Filter Engine
• Relevance threshold: >0.7
• Deduplication
• Diversity scoring
• Metadata filtering"]:::process CONTEXT_BUILDER["Context Builder
• Assemble top chunks
• Add source citations
• Format for LLM
• Max context: 4k tokens"]:::process end QDRANT -->|"Top-K
Results"| RERANKER RERANKER --> FILTER_ENGINE FILTER_ENGINE --> CONTEXT_BUILDER %% ===================================== %% LAYER 7: LLM GENERATION %% ===================================== subgraph LLMGEN["🤖 LLM ANSWER GENERATION"] RAG_PROMPT["RAG Prompt Template
Structure:
• System: You are a helpful assistant
• Context: Retrieved chunks
• Question: User query
• Instruction: Answer using context"]:::llm LLM_ENGINE["LLM Engine
Model: Qwen 2.5 (14B/32B)
API: Ollama/vLLM
Port: 11434
Temp: 0.2 (factual)
Max tokens: 2048
Stream: enabled"]:::llm ANSWER_POST["Answer Post-processor
• Citation formatting
• Source links
• Confidence scoring
• Fallback handling"]:::llm end CONTEXT_BUILDER -->|"Context
+ Sources"| RAG_PROMPT QUERY_PREPROCESS -->|"Original
Question"| RAG_PROMPT RAG_PROMPT --> LLM_ENGINE LLM_ENGINE --> ANSWER_POST ANSWER_POST -->|"Final
Answer"| USER_INPUT %% ===================================== %% LAYER 8: CACHING LAYER %% ===================================== subgraph CACHING["💾 MULTI-LEVEL CACHE"] REDIS_CACHE["Redis Cluster
Mode: Cluster
Nodes: 3
Memory: 16GB
Persistence: AOF"]:::cache CACHE_TYPES["Cache Types:
• Query embeddings (TTL: 7d)
• Search results (TTL: 1h)
• LLM responses (TTL: 24h)
• Popular queries (no TTL)
Eviction: LRU"]:::cache CACHE_WARMING["Cache Warming
Pre-compute:
• Top 100 queries
• Common patterns
Schedule: daily
Update: on doc changes"]:::cache end REDIS_CACHE --> CACHE_TYPES CACHE_TYPES --> CACHE_WARMING QUERY_EMBEDDER <-.-> REDIS_CACHE HYBRID_SEARCH <-.-> REDIS_CACHE LLM_ENGINE <-.-> REDIS_CACHE %% ===================================== %% LAYER 9: SCALING & LOAD BALANCING %% ===================================== subgraph SCALING["📈 SCALING INFRASTRUCTURE"] LOAD_BALANCER["Load Balancer
Type: Nginx / HAProxy
Algorithm: Least connections
Health checks: /health
Timeout: 30s"]:::monitor QUERY_API["Query API Instances
Replicas: 3-10 (auto-scale)
Lang: FastAPI
Container: Docker
Orchestration: K8s"]:::user EMBED_WORKERS["Embedding Workers
Replicas: 4-8
GPU: Optional
Queue: Redis
Auto-scale: based on queue depth"]:::process end LOAD_BALANCER --> QUERY_API QUERY_API --> USER_INPUT %% ===================================== %% LAYER 10: MONITORING & OBSERVABILITY %% ===================================== subgraph MONITORING["📊 MONITORING & ANALYTICS"] METRICS["Prometheus Metrics
• Query latency (p50, p95, p99)
• Vector search time
• LLM response time
• Cache hit rate
• Embedding generation rate
Scrape: 15s"]:::monitor DASHBOARDS["Grafana Dashboards
• RAG Performance
• Query analytics
• Resource utilization
• Error tracking
Refresh: real-time"]:::monitor ANALYTICS["Query Analytics
Track:
• Popular queries
• Failed queries
• Avg relevance scores
• User satisfaction
Storage: TimescaleDB"]:::monitor ALERTS["Alerting Rules
• Latency > 5s
• Error rate > 5%
• Cache hit < 70%
• Vector DB down
Channel: Slack + Email"]:::monitor end METRICS --> DASHBOARDS DASHBOARDS --> ANALYTICS ANALYTICS --> ALERTS QUERY_API -.->|"metrics"| METRICS HYBRID_SEARCH -.->|"metrics"| METRICS LLM_ENGINE -.->|"metrics"| METRICS QDRANT -.->|"metrics"| METRICS %% ===================================== %% LAYER 11: FEEDBACK LOOP %% ===================================== subgraph FEEDBACK["🔄 FEEDBACK & IMPROVEMENT"] USER_FEEDBACK["User Feedback
• Thumbs up/down
• Relevance rating
• Comments
Storage: PostgreSQL"]:::user FEEDBACK_ANALYSIS["Feedback Analysis
• Identify bad answers
• Track improvement areas
• A/B testing results
Schedule: weekly"]:::monitor MODEL_TUNING["Model Fine-tuning
• Re-rank model updates
• Prompt optimization
• Chunk size tuning
Cycle: monthly"]:::process end USER_INPUT -->|"Rate
Answer"| USER_FEEDBACK USER_FEEDBACK --> FEEDBACK_ANALYSIS FEEDBACK_ANALYSIS --> MODEL_TUNING MODEL_TUNING -.->|"Improve"| RERANKER %% ===================================== %% ANNOTATIONS %% ===================================== SCALE_NOTE["📈 SCALABILITY:
• Vector DB: Horizontal sharding
• API: K8s auto-scaling (HPA)
• Workers: Queue-based scaling
• Cache: Redis cluster
Target: 100k+ docs, 1k+ QPS"]:::monitor PERF_NOTE["⚡ PERFORMANCE TARGETS:
• Query latency: <3s (p95)
• Vector search: <100ms
• LLM generation: <2s
• Cache hit rate: >80%
• Throughput: 1000 QPS"]:::cache QUALITY_NOTE["✅ QUALITY ASSURANCE:
• Re-ranking for precision
• Source attribution
• Confidence scoring
• Fallback responses
• Human feedback loop"]:::process SCALE_NOTE -.-> QDRANT PERF_NOTE -.-> REDIS_CACHE QUALITY_NOTE -.-> RERANKER ``` ### Pipeline RAG **1. Ingestion Pipeline (Offline)** - Parsing documentazione MkDocs - Chunking intelligente (512 token, overlap 128) - Generazione embeddings (all-MiniLM-L6-v2) - Storage in Vector Database (Qdrant cluster) **2. Query Pipeline (Real-time)** - Embedding della query utente - Hybrid search (semantic + keyword) - Re-ranking con cross-encoder - Context assembly per LLM **3. Generation** - LLM locale (Qwen) con RAG context - Source attribution automatica - Streaming delle risposte **4. Scaling Strategy** - Vector DB sharding automatico - API instances con auto-scaling K8s - Redis cluster per caching multi-livello - Load balancing con Nginx --- ## 📧 Contatti - **Team**: Infrastructure Documentation Team - **Email**: infra-docs@company.com - **GitLab**: https://gitlab.com/company/infra-docs-automation --- **Versione**: 1.0.0 **Ultimo aggiornamento**: 2025-10-28