# πŸ“š Automated Infrastructure Documentation System Sistema automatizzato per la generazione e mantenimento della documentazione tecnica dell'infrastruttura aziendale tramite LLM locale con validazione umana e pubblicazione GitOps. [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![Redis](https://img.shields.io/badge/Redis-7.2+-red.svg)](https://redis.io/) ## πŸ“‹ Indice - [Overview](#overview) - [Architettura](#architettura) - [Schema Architetturale](#schema-architetturale) - [Schema Tecnico](#schema-tecnico) - [Contatti](#contatti) ## 🎯 Overview Sistema progettato per **automatizzare la creazione e l'aggiornamento della documentazione tecnica** di sistemi infrastrutturali complessi (VMware, Kubernetes, Linux, Cisco, ecc.) utilizzando un Large Language Model locale (Qwen). ### Caratteristiche Principali - βœ… **Raccolta dati asincrona** da molteplici sistemi infrastrutturali - βœ… **Isolamento di sicurezza**: LLM non accede mai ai sistemi live - βœ… **Change Detection**: Documentazione generata solo su modifiche rilevate - βœ… **Redis Cache** per storage dati e performance - βœ… **LLM locale on-premise** (Qwen) tramite MCP Server - βœ… **Human-in-the-loop validation** con workflow GitOps - βœ… **CI/CD automatizzato** per pubblicazione ## πŸ—οΈ Architettura Il sistema Γ¨ suddiviso in **3 flussi principali**: 1. **Raccolta Dati (Background)**: Connettori interrogano periodicamente i sistemi infrastrutturali tramite API e aggiornano Redis 2. **Change Detection**: Sistema di rilevamento modifiche che attiva la generazione documentazione solo quando necessario 3. **Generazione e Pubblicazione (Triggered)**: LLM locale (Qwen) genera markdown leggendo da Redis, seguito da review umana e deploy automatico > **Principio di Sicurezza**: L'LLM non ha mai accesso diretto ai sistemi infrastrutturali. Tutti i dati sono letti da Redis. > **Principio di Efficienza**: La documentazione viene generata solo quando il sistema rileva modifiche nella configurazione infrastrutturale. --- ## πŸ“Š Schema Architetturale ### Management View Schema semplificato per presentazioni executive e management. ```mermaid graph TB %% Styling classDef infrastructure fill:#e1f5ff,stroke:#01579b,stroke-width:3px,color:#333 classDef cache fill:#f3e5f5,stroke:#4a148c,stroke-width:3px,color:#333 classDef change fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#333 classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px,color:#333 classDef git fill:#fce4ec,stroke:#880e4f,stroke-width:3px,color:#333 classDef human fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#333 %% ======================================== %% FLUSSO 1: RACCOLTA DATI (Background) %% ======================================== INFRA[("🏒 SISTEMI
INFRASTRUTTURALI

VMware | K8s | Linux | Cisco")]:::infrastructure CONN["πŸ”Œ CONNETTORI
Polling Automatico"]:::infrastructure REDIS[("πŸ’Ύ REDIS CACHE
Configurazione
Infrastruttura")]:::cache INFRA -->|"API Polling
Continuo"| CONN CONN -->|"Update
Configurazione"| REDIS %% ======================================== %% CHANGE DETECTION %% ======================================== CHANGE["πŸ” CHANGE DETECTOR
Rileva Modifiche
Configurazione"]:::change REDIS -->|"Monitor
Changes"| CHANGE %% ======================================== %% FLUSSO 2: GENERAZIONE DOCUMENTAZIONE (Triggered) %% ======================================== TRIGGER["⚑ TRIGGER
Solo se modifiche"]:::change USER["πŸ‘€ UTENTE
Richiesta Manuale"]:::human LLM["πŸ€– LLM ENGINE
Qwen (Locale)"]:::llm MCP["πŸ”§ MCP SERVER
API Control Platform"]:::llm DOC["πŸ“„ DOCUMENTO
Markdown Generato"]:::llm CHANGE -->|"Modifiche
Rilevate"| TRIGGER USER -.->|"Opzionale"| TRIGGER TRIGGER -->|"Avvia
Generazione"| LLM LLM -->|"Tool Call"| MCP MCP -->|"Query"| REDIS REDIS -->|"Dati Config"| MCP MCP -->|"Context"| LLM LLM -->|"Genera"| DOC %% ======================================== %% FLUSSO 3: VALIDAZIONE E PUBBLICAZIONE %% ======================================== GIT["πŸ“¦ GITLAB
Repository"]:::git PR["πŸ”€ PULL REQUEST
Review Automatica"]:::git TECH["πŸ‘¨β€πŸ’Ό TEAM TECNICO
Validazione Umana"]:::human PIPELINE["⚑ CI/CD PIPELINE
GitLab Runner"]:::git MKDOCS["πŸ“š MKDOCS
Static Site Generator"]:::git WEB["🌐 DOCUMENTAZIONE
GitLab Pages
(Pubblicata)"]:::git DOC -->|"Push +
Branch"| GIT GIT -->|"Crea"| PR PR -->|"Notifica"| TECH TECH -->|"Approva +
Merge"| GIT GIT -->|"Trigger"| PIPELINE PIPELINE -->|"Build"| MKDOCS MKDOCS -->|"Deploy"| WEB %% ======================================== %% ANNOTAZIONI %% ======================================== SECURITY["πŸ”’ SICUREZZA
LLM isolato dai sistemi live"]:::human EFFICIENCY["⚑ EFFICIENZA
Doc generata solo
su modifiche"]:::change LLM -.->|"NESSUN
ACCESSO"| INFRA SECURITY -.-> LLM EFFICIENCY -.-> CHANGE ``` --- ## πŸ”§ Schema Tecnico ### Implementation View Schema dettagliato per il team tecnico con specifiche implementative. ```mermaid graph TB %% Styling tecnico classDef infra fill:#e1f5ff,stroke:#01579b,stroke-width:2px,color:#333,font-size:11px classDef connector fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#333,font-size:11px classDef cache fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#333,font-size:11px classDef change fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#333,font-size:11px classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#333,font-size:11px classDef git fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#333,font-size:11px classDef monitor fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px %% ===================================== %% LAYER 1: SISTEMI SORGENTE %% ===================================== subgraph SOURCES["🏒 INFRASTRUCTURE SOURCES"] VCENTER["VMware vCenter
API: vSphere REST 7.0+
Port: 443/HTTPS
Auth: API Token"]:::infra K8S_API["Kubernetes API
API: v1.28+
Port: 6443/HTTPS
Auth: ServiceAccount + RBAC"]:::infra LINUX["Linux Servers
Protocol: SSH/Ansible
Port: 22
Auth: SSH Keys"]:::infra CISCO["Cisco Devices
Protocol: NETCONF/RESTCONF
Port: 830/443
Auth: AAA"]:::infra end %% ===================================== %% LAYER 2: CONNETTORI %% ===================================== subgraph CONNECTORS["πŸ”Œ DATA COLLECTORS (Python/Go)"] CONN_VM["VMware Collector
Lang: Python 3.11
Lib: pyvmomi
Schedule: */15 * * * *
Output: JSON β†’ Redis"]:::connector CONN_K8S["K8s Collector
Lang: Python 3.11
Lib: kubernetes-client
Schedule: */5 * * * *
Resources: pods,svc,ing,deploy"]:::connector CONN_LNX["Linux Collector
Lang: Python 3.11
Lib: paramiko/ansible
Schedule: */30 * * * *
Data: sysinfo,packages,services"]:::connector CONN_CSC["Cisco Collector
Lang: Python 3.11
Lib: ncclient
Schedule: */30 * * * *
Data: interfaces,routing,vlans"]:::connector end VCENTER -->|"GET /api/vcenter/vm"| CONN_VM K8S_API -->|"kubectl proxy
API calls"| CONN_K8S LINUX -->|"SSH batch
commands"| CONN_LNX CISCO -->|"NETCONF
get-config"| CONN_CSC %% ===================================== %% LAYER 3: REDIS STORAGE %% ===================================== subgraph STORAGE["πŸ’Ύ REDIS CLUSTER"] REDIS_CLUSTER["Redis Cluster
Mode: Cluster (6 nodes)
Port: 6379
Persistence: RDB + AOF
Memory: 64GB
Eviction: allkeys-lru"]:::cache REDIS_KEYS["Key Structure:
β€’ vmware:vcenter-id:vms:hash
β€’ k8s:cluster:namespace:resource:hash
β€’ linux:hostname:info:hash
β€’ cisco:device-id:config:hash
β€’ changelog:timestamp:diff
TTL: 30d for data, 90d for changelog"]:::cache end CONN_VM -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER CONN_K8S -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER CONN_LNX -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER CONN_CSC -->|"HSET/HMSET
+ Hash Storage"| REDIS_CLUSTER REDIS_CLUSTER --> REDIS_KEYS %% ===================================== %% LAYER 4: CHANGE DETECTION %% ===================================== subgraph CHANGE_DETECTION["πŸ” CHANGE DETECTION SYSTEM"] DETECTOR["Change Detector Service
Lang: Python 3.11
Lib: redis-py
Algorithm: Hash comparison
Check interval: */5 * * * *"]:::change DIFF_ENGINE["Diff Engine
β€’ Deep object comparison
β€’ JSON diff generation
β€’ Change classification
β€’ Severity assessment"]:::change CHANGE_LOG["Change Log Store
Key: changelog:*
Data: diff JSON + metadata
Indexed by: timestamp, resource"]:::change NOTIFIER["Change Notifier
β€’ Webhook triggers
β€’ Slack notifications
β€’ Event emission
Target: LLM trigger"]:::change end REDIS_CLUSTER -->|"Monitor
key changes"| DETECTOR DETECTOR --> DIFF_ENGINE DIFF_ENGINE -->|"Store diff"| CHANGE_LOG CHANGE_LOG --> REDIS_CLUSTER DIFF_ENGINE -->|"Notify if
significant"| NOTIFIER %% ===================================== %% LAYER 5: LLM TRIGGER & GENERATION %% ===================================== subgraph TRIGGER_SYSTEM["⚑ TRIGGER SYSTEM"] TRIGGER_SVC["Trigger Service
Lang: Python 3.11
Listen: Webhook + Redis Pub/Sub
Debounce: 5 min
Batch: multiple changes"]:::change QUEUE["Generation Queue
Type: Redis List
Priority: High/Medium/Low
Processing: FIFO"]:::change end NOTIFIER -->|"Trigger event"| TRIGGER_SVC TRIGGER_SVC -->|"Enqueue
generation task"| QUEUE subgraph LLM_LAYER["πŸ€– AI GENERATION LAYER"] LLM_ENGINE["LLM Engine
Model: Qwen (Locale)
API: Ollama/vLLM/LM Studio
Port: 11434
Temp: 0.3
Max Tokens: 4096
Timeout: 120s"]:::llm MCP_SERVER["MCP Server
Lang: TypeScript/Node.js
Port: 3000
Protocol: JSON-RPC 2.0
Auth: JWT tokens"]:::llm MCP_TOOLS["MCP Tools:
β€’ getVMwareInventory(vcenter)
β€’ getK8sResources(cluster,ns,type)
β€’ getLinuxSystemInfo(hostname)
β€’ getCiscoConfig(device,section)
β€’ getChangelog(start,end,resource)
Return: JSON + Metadata"]:::llm end QUEUE -->|"Dequeue
task"| LLM_ENGINE LLM_ENGINE <-->|"Tool calls
JSON-RPC"| MCP_SERVER MCP_SERVER --> MCP_TOOLS MCP_TOOLS -->|"HGETALL/MGET
Read data"| REDIS_CLUSTER REDIS_CLUSTER -->|"Config data
+ Changelog"| MCP_TOOLS MCP_TOOLS -->|"Structured Data
+ Context"| LLM_ENGINE subgraph OUTPUT["πŸ“ DOCUMENT GENERATION"] TEMPLATE["Template Engine
Format: Jinja2
Templates: markdown/*.j2
Variables: from LLM"]:::llm MARKDOWN["Markdown Output
Format: CommonMark
Metadata: YAML frontmatter
Change summary included
Assets: diagrams in mermaid"]:::llm VALIDATOR["Doc Validator
β€’ Markdown linting
β€’ Link checking
β€’ Schema validation
β€’ Change verification"]:::llm end LLM_ENGINE --> TEMPLATE TEMPLATE --> MARKDOWN MARKDOWN --> VALIDATOR %% ===================================== %% LAYER 6: GITOPS %% ===================================== subgraph GITOPS["πŸ”„ GITOPS WORKFLOW"] GIT_REPO["GitLab Repository
URL: gitlab.com/docs/infra
Branch strategy: main + feature/*
Protected: main (require approval)"]:::git GIT_API["GitLab API
API: v4
Auth: Project Access Token
Permissions: api, write_repo"]:::git PR_AUTO["Automated PR Creator
Lang: Python 3.11
Lib: python-gitlab
Template: .gitlab/merge_request.md
Include: change summary"]:::git end VALIDATOR -->|"git add/commit/push"| GIT_REPO GIT_REPO <--> GIT_API GIT_API --> PR_AUTO REVIEWER["πŸ‘¨β€πŸ’Ό Technical Reviewer
Role: Maintainer/Owner
Review: diff + validation
Check: change correlation
Approve: required (min 1)"]:::monitor PR_AUTO -->|"Notification
Email + Slack"| REVIEWER REVIEWER -->|"Merge to main"| GIT_REPO %% ===================================== %% LAYER 7: CI/CD & PUBLISH %% ===================================== subgraph CICD["⚑ CI/CD PIPELINE"] GITLAB_CI["GitLab CI/CD
Runner: docker
Image: python:3.11-alpine
Stages: build, test, deploy"]:::git PIPELINE_JOBS["Pipeline Jobs:
1. lint (markdownlint-cli)
2. build (mkdocs build)
3. test (link-checker)
4. deploy (rsync/s3)"]:::git MKDOCS_CFG["MkDocs Config
Theme: material
Plugins: search, tags, mermaid
Extensions: admonition, codehilite"]:::git end GIT_REPO -->|"on: push to main
Webhook trigger"| GITLAB_CI GITLAB_CI --> PIPELINE_JOBS PIPELINE_JOBS --> MKDOCS_CFG subgraph PUBLISH["🌐 PUBLICATION"] STATIC_SITE["Static Site
Generator: MkDocs
Output: HTML/CSS/JS
Assets: optimized images"]:::git CDN["GitLab Pages / S3 + CloudFront
URL: docs.company.com
SSL: Let's Encrypt
Cache: 1h"]:::git SEARCH["Search Index
Engine: Algolia/Meilisearch
Update: on publish
API: REST"]:::git end MKDOCS_CFG -->|"mkdocs build
--strict"| STATIC_SITE STATIC_SITE --> CDN STATIC_SITE --> SEARCH %% ===================================== %% LAYER 8: MONITORING & OBSERVABILITY %% ===================================== subgraph OBSERVABILITY["πŸ“Š MONITORING & LOGGING"] PROMETHEUS["Prometheus
Metrics: collector updates, changes detected
Scrape: 30s
Retention: 15d"]:::monitor GRAFANA["Grafana Dashboards
β€’ Collector status
β€’ Redis performance
β€’ Change detection rate
β€’ LLM response times
β€’ Pipeline success rate"]:::monitor ELK["ELK Stack
Logs: all components
Index: daily rotation
Retention: 30d"]:::monitor ALERTS["Alerting
β€’ Collector failures
β€’ Redis issues
β€’ Change detection errors
β€’ Pipeline failures
Channel: Slack + PagerDuty"]:::monitor end CONN_VM -.->|"metrics"| PROMETHEUS CONN_K8S -.->|"metrics"| PROMETHEUS REDIS_CLUSTER -.->|"metrics"| PROMETHEUS DETECTOR -.->|"metrics"| PROMETHEUS MCP_SERVER -.->|"metrics"| PROMETHEUS GITLAB_CI -.->|"metrics"| PROMETHEUS PROMETHEUS --> GRAFANA CONN_VM -.->|"logs"| ELK DETECTOR -.->|"logs"| ELK MCP_SERVER -.->|"logs"| ELK GITLAB_CI -.->|"logs"| ELK GRAFANA --> ALERTS %% ===================================== %% SECURITY & EFFICIENCY ANNOTATIONS %% ===================================== SEC1["πŸ”’ SECURITY:
β€’ All APIs use TLS 1.3
β€’ Secrets in Vault/K8s Secrets
β€’ Network: private VPC
β€’ LLM has NO direct access"]:::monitor SEC2["πŸ” AUTHENTICATION:
β€’ API Tokens rotated 90d
β€’ RBAC enforced
β€’ Audit logs enabled
β€’ MFA required for Git"]:::monitor EFF1["⚑ EFFICIENCY:
β€’ Doc generation only on changes
β€’ Debounce prevents spam
β€’ Hash-based change detection
β€’ Batch processing"]:::change SEC1 -.-> MCP_SERVER SEC2 -.-> GIT_REPO EFF1 -.-> DETECTOR ``` --- ## πŸ’¬ Sistema RAG Conversazionale ### Interrogazione Documentazione con AI Sistema per "parlare" con la documentazione utilizzando Retrieval Augmented Generation (RAG). Permette agli utenti di porre domande in linguaggio naturale e ricevere risposte accurate basate sulla documentazione, con citazioni delle fonti. #### Caratteristiche Principali - βœ… **Semantic Search**: Ricerca vettoriale per comprendere l'intento della query - βœ… **ScalabilitΓ **: Gestione di grandi volumi di documentazione (100k+ documenti) - βœ… **Performance**: Risposte in <3 secondi con caching intelligente - βœ… **Accuratezza**: Re-ranking e source attribution per risposte precise - βœ… **LLM Locale**: Qwen on-premise per privacy e controllo ### Schema RAG - Management View ```mermaid graph TB %% Styling classDef docs fill:#e3f2fd,stroke:#1565c0,stroke-width:3px,color:#333 classDef process fill:#f3e5f5,stroke:#4a148c,stroke-width:3px,color:#333 classDef vector fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#333 classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px,color:#333 classDef user fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#333 classDef cache fill:#fce4ec,stroke:#880e4f,stroke-width:3px,color:#333 %% ======================================== %% INGESTION PIPELINE (Offline) %% ======================================== subgraph INGESTION["πŸ“š INGESTION PIPELINE (Offline Process)"] DOCS["πŸ“„ DOCUMENTAZIONE
MkDocs Output
Markdown Files"]:::docs CHUNKER["βœ‚οΈ DOCUMENT CHUNKER
Split & Overlap
Metadata Extraction"]:::process EMBEDDER["🧠 EMBEDDING MODEL
Text β†’ Vectors
Dimensione: 768/1024"]:::process VECTORDB[("πŸ—„οΈ VECTOR DATABASE
Qdrant/Milvus
Sharded & Replicated")]:::vector end DOCS -->|"Parse
Markdown"| CHUNKER CHUNKER -->|"Text Chunks
+ Metadata"| EMBEDDER EMBEDDER -->|"Store
Embeddings"| VECTORDB %% ======================================== %% QUERY PIPELINE (Real-time) %% ======================================== subgraph QUERY["πŸ’¬ QUERY PIPELINE (Real-time)"] USER["πŸ‘€ UTENTE
Domanda/Query"]:::user QUERY_EMBED["🧠 QUERY EMBEDDING
Query β†’ Vector"]:::process SEARCH["πŸ” SEMANTIC SEARCH
Vector Similarity
Top-K Results"]:::vector RERANK["πŸ“Š RE-RANKING
Context Scoring
Relevance Filter"]:::process CONTEXT["πŸ“‹ CONTEXT BUILDER
Assemble Chunks
Add Metadata"]:::process end USER -->|"Natural Language
Question"| QUERY_EMBED QUERY_EMBED -->|"Query Vector"| SEARCH SEARCH -->|"Search"| VECTORDB VECTORDB -->|"Top-K Chunks
+ Scores"| SEARCH SEARCH -->|"Initial Results"| RERANK RERANK -->|"Filtered
Chunks"| CONTEXT %% ======================================== %% GENERATION (LLM) %% ======================================== subgraph GENERATION["πŸ€– ANSWER GENERATION"] LLM_RAG["πŸ€– LLM ENGINE
Qwen (Locale)
+ RAG Context"]:::llm ANSWER["πŸ’‘ RISPOSTA
Generated Answer
+ Source Citations"]:::llm end CONTEXT -->|"Context
+ Sources"| LLM_RAG LLM_RAG -->|"Generate"| ANSWER ANSWER -->|"Display"| USER %% ======================================== %% CACHING & OPTIMIZATION %% ======================================== CACHE[("πŸ’Ύ REDIS CACHE
Query Cache
Embedding Cache")]:::cache QUERY_EMBED -.->|"Check Cache"| CACHE CACHE -.->|"Cached
Embedding"| SEARCH SEARCH -.->|"Cache
Results"| CACHE %% ======================================== %% SCALING & UPDATE %% ======================================== UPDATE["πŸ”„ INCREMENTAL UPDATE
On Doc Changes
Auto Re-index"]:::docs DOCS -.->|"Doc Updated"| UPDATE UPDATE -.->|"Re-process
Changed Docs"| CHUNKER %% ======================================== %% ANNOTATIONS %% ======================================== SCALE["πŸ“ˆ SCALABILITΓ€
β€’ Vector DB sharding
β€’ Horizontal scaling
β€’ Load balancing"]:::vector PERF["⚑ PERFORMANCE
β€’ Query cache
β€’ Embedding cache
β€’ Async processing"]:::cache QUALITY["βœ… QUALITY
β€’ Re-ranking
β€’ Relevance scoring
β€’ Source citations"]:::process SCALE -.-> VECTORDB PERF -.-> CACHE QUALITY -.-> RERANK ``` ### Schema RAG - Technical View ```mermaid graph TB %% Styling classDef docs fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#333,font-size:11px classDef process fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#333,font-size:11px classDef vector fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#333,font-size:11px classDef llm fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#333,font-size:11px classDef user fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px classDef cache fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#333,font-size:11px classDef monitor fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#333,font-size:11px %% ===================================== %% LAYER 1: DOCUMENTATION SOURCE %% ===================================== subgraph DOCSOURCE["πŸ“š DOCUMENTATION SOURCE"] MKDOCS_OUT["MkDocs Static Site
Path: /site/
Format: HTML + Markdown
Assets: images, diagrams
Update: on Git merge"]:::docs DOC_WATCHER["Document Watcher
Lang: Python 3.11
Lib: watchdog
Trigger: file system events
Debounce: 30s"]:::docs DOC_PARSER["Document Parser
HTML β†’ Plain Text
Preserve structure
Extract metadata
Clean formatting"]:::docs end MKDOCS_OUT --> DOC_WATCHER DOC_WATCHER -->|"New/Modified
Docs"| DOC_PARSER %% ===================================== %% LAYER 2: CHUNKING STRATEGY %% ===================================== subgraph CHUNKING["βœ‚οΈ INTELLIGENT CHUNKING"] CHUNK_ENGINE["Chunking Engine
Lang: Python 3.11
Lib: langchain/llama-index
Strategy: Recursive Character"]:::process CHUNK_CONFIG["Chunking Config:
β€’ Chunk Size: 512 tokens
β€’ Overlap: 128 tokens
β€’ Separators: \\n\\n, \\n, . , ' '
β€’ Min chunk: 100 tokens
β€’ Max chunk: 1024 tokens"]:::process METADATA_EXTRACTOR["Metadata Extractor
Extract:
β€’ Document title
β€’ Section headers
β€’ Tags/keywords
β€’ Creation date
β€’ File path
β€’ Doc type"]:::process end DOC_PARSER -->|"Parsed Text"| CHUNK_ENGINE CHUNK_ENGINE --> CHUNK_CONFIG CHUNK_ENGINE --> METADATA_EXTRACTOR %% ===================================== %% LAYER 3: EMBEDDING GENERATION %% ===================================== subgraph EMBEDDING["🧠 EMBEDDING GENERATION"] EMBED_MODEL["Embedding Model
Model: all-MiniLM-L6-v2 / BGE-M3
Dim: 384/768/1024
API: sentence-transformers
Batch size: 32
GPU: CUDA acceleration"]:::process EMBED_CACHE["Embedding Cache
Type: Redis Hash
Key: hash(text)
TTL: 30d
Hit rate target: >80%"]:::cache EMBED_QUEUE["Processing Queue
Type: Redis List
Workers: 4-8
Rate: 100 chunks/s
Retry: 3 attempts"]:::process end METADATA_EXTRACTOR -->|"Chunks
+ Metadata"| EMBED_QUEUE EMBED_QUEUE --> EMBED_MODEL EMBED_MODEL <-.->|"Cache
Check/Store"| EMBED_CACHE %% ===================================== %% LAYER 4: VECTOR DATABASE %% ===================================== subgraph VECTORDB["πŸ—„οΈ VECTOR DATABASE CLUSTER"] QDRANT["Qdrant Cluster
Version: 1.7+
Nodes: 3-6 (replicated)
Shards: auto per collection
Port: 6333/6334"]:::vector COLLECTIONS["Collections:
β€’ docs_main (dim: 768)
β€’ docs_code (dim: 768)
β€’ docs_api (dim: 768)
Distance: Cosine
Index: HNSW (M=16, ef=100)"]:::vector SHARD_STRATEGY["Sharding Strategy:
β€’ Auto-sharding enabled
β€’ Shard size: 100k vectors
β€’ Replication factor: 2
β€’ Load balancing: Round-robin"]:::vector end EMBED_MODEL -->|"Store
Vectors"| QDRANT QDRANT --> COLLECTIONS QDRANT --> SHARD_STRATEGY %% ===================================== %% LAYER 5: QUERY PROCESSING %% ===================================== subgraph QUERYPROC["πŸ’¬ QUERY PROCESSING PIPELINE"] USER_INPUT["User Input
Interface: Web UI / API
Auth: JWT tokens
Rate limit: 20 req/min
Timeout: 30s"]:::user QUERY_PREPROCESS["Query Preprocessor
β€’ Spelling correction
β€’ Intent detection
β€’ Query expansion
β€’ Language detection"]:::process QUERY_EMBEDDER["Query Embedder
Same model as docs
Cache: Redis
Latency: <50ms"]:::process HYBRID_SEARCH["Hybrid Search
1. Vector search (semantic)
2. Keyword search (BM25)
3. Fusion: RRF algorithm
Top-K: 20 initial results"]:::vector end USER_INPUT -->|"Natural
Language"| QUERY_PREPROCESS QUERY_PREPROCESS --> QUERY_EMBEDDER QUERY_EMBEDDER <-.->|"Cache"| EMBED_CACHE QUERY_EMBEDDER -->|"Query
Vector"| HYBRID_SEARCH HYBRID_SEARCH -->|"Search"| QDRANT %% ===================================== %% LAYER 6: RE-RANKING & FILTERING %% ===================================== subgraph RERANK["πŸ“Š RE-RANKING & FILTERING"] RERANKER["Cross-Encoder Re-ranker
Model: ms-marco-MiniLM
Purpose: Fine-grained relevance
Process: Top-20 β†’ Top-5
Latency: 100-200ms"]:::process FILTER_ENGINE["Filter Engine
β€’ Relevance threshold: >0.7
β€’ Deduplication
β€’ Diversity scoring
β€’ Metadata filtering"]:::process CONTEXT_BUILDER["Context Builder
β€’ Assemble top chunks
β€’ Add source citations
β€’ Format for LLM
β€’ Max context: 4k tokens"]:::process end QDRANT -->|"Top-K
Results"| RERANKER RERANKER --> FILTER_ENGINE FILTER_ENGINE --> CONTEXT_BUILDER %% ===================================== %% LAYER 7: LLM GENERATION %% ===================================== subgraph LLMGEN["πŸ€– LLM ANSWER GENERATION"] RAG_PROMPT["RAG Prompt Template
Structure:
β€’ System: You are a helpful assistant
β€’ Context: Retrieved chunks
β€’ Question: User query
β€’ Instruction: Answer using context"]:::llm LLM_ENGINE["LLM Engine
Model: Qwen 2.5 (14B/32B)
API: Ollama/vLLM
Port: 11434
Temp: 0.2 (factual)
Max tokens: 2048
Stream: enabled"]:::llm ANSWER_POST["Answer Post-processor
β€’ Citation formatting
β€’ Source links
β€’ Confidence scoring
β€’ Fallback handling"]:::llm end CONTEXT_BUILDER -->|"Context
+ Sources"| RAG_PROMPT QUERY_PREPROCESS -->|"Original
Question"| RAG_PROMPT RAG_PROMPT --> LLM_ENGINE LLM_ENGINE --> ANSWER_POST ANSWER_POST -->|"Final
Answer"| USER_INPUT %% ===================================== %% LAYER 8: CACHING LAYER %% ===================================== subgraph CACHING["πŸ’Ύ MULTI-LEVEL CACHE"] REDIS_CACHE["Redis Cluster
Mode: Cluster
Nodes: 3
Memory: 16GB
Persistence: AOF"]:::cache CACHE_TYPES["Cache Types:
β€’ Query embeddings (TTL: 7d)
β€’ Search results (TTL: 1h)
β€’ LLM responses (TTL: 24h)
β€’ Popular queries (no TTL)
Eviction: LRU"]:::cache CACHE_WARMING["Cache Warming
Pre-compute:
β€’ Top 100 queries
β€’ Common patterns
Schedule: daily
Update: on doc changes"]:::cache end REDIS_CACHE --> CACHE_TYPES CACHE_TYPES --> CACHE_WARMING QUERY_EMBEDDER <-.-> REDIS_CACHE HYBRID_SEARCH <-.-> REDIS_CACHE LLM_ENGINE <-.-> REDIS_CACHE %% ===================================== %% LAYER 9: SCALING & LOAD BALANCING %% ===================================== subgraph SCALING["πŸ“ˆ SCALING INFRASTRUCTURE"] LOAD_BALANCER["Load Balancer
Type: Nginx / HAProxy
Algorithm: Least connections
Health checks: /health
Timeout: 30s"]:::monitor QUERY_API["Query API Instances
Replicas: 3-10 (auto-scale)
Lang: FastAPI
Container: Docker
Orchestration: K8s"]:::user EMBED_WORKERS["Embedding Workers
Replicas: 4-8
GPU: Optional
Queue: Redis
Auto-scale: based on queue depth"]:::process end LOAD_BALANCER --> QUERY_API QUERY_API --> USER_INPUT %% ===================================== %% LAYER 10: MONITORING & OBSERVABILITY %% ===================================== subgraph MONITORING["πŸ“Š MONITORING & ANALYTICS"] METRICS["Prometheus Metrics
β€’ Query latency (p50, p95, p99)
β€’ Vector search time
β€’ LLM response time
β€’ Cache hit rate
β€’ Embedding generation rate
Scrape: 15s"]:::monitor DASHBOARDS["Grafana Dashboards
β€’ RAG Performance
β€’ Query analytics
β€’ Resource utilization
β€’ Error tracking
Refresh: real-time"]:::monitor ANALYTICS["Query Analytics
Track:
β€’ Popular queries
β€’ Failed queries
β€’ Avg relevance scores
β€’ User satisfaction
Storage: TimescaleDB"]:::monitor ALERTS["Alerting Rules
β€’ Latency > 5s
β€’ Error rate > 5%
β€’ Cache hit < 70%
β€’ Vector DB down
Channel: Slack + Email"]:::monitor end METRICS --> DASHBOARDS DASHBOARDS --> ANALYTICS ANALYTICS --> ALERTS QUERY_API -.->|"metrics"| METRICS HYBRID_SEARCH -.->|"metrics"| METRICS LLM_ENGINE -.->|"metrics"| METRICS QDRANT -.->|"metrics"| METRICS %% ===================================== %% LAYER 11: FEEDBACK LOOP %% ===================================== subgraph FEEDBACK["πŸ”„ FEEDBACK & IMPROVEMENT"] USER_FEEDBACK["User Feedback
β€’ Thumbs up/down
β€’ Relevance rating
β€’ Comments
Storage: PostgreSQL"]:::user FEEDBACK_ANALYSIS["Feedback Analysis
β€’ Identify bad answers
β€’ Track improvement areas
β€’ A/B testing results
Schedule: weekly"]:::monitor MODEL_TUNING["Model Fine-tuning
β€’ Re-rank model updates
β€’ Prompt optimization
β€’ Chunk size tuning
Cycle: monthly"]:::process end USER_INPUT -->|"Rate
Answer"| USER_FEEDBACK USER_FEEDBACK --> FEEDBACK_ANALYSIS FEEDBACK_ANALYSIS --> MODEL_TUNING MODEL_TUNING -.->|"Improve"| RERANKER %% ===================================== %% ANNOTATIONS %% ===================================== SCALE_NOTE["πŸ“ˆ SCALABILITY:
β€’ Vector DB: Horizontal sharding
β€’ API: K8s auto-scaling (HPA)
β€’ Workers: Queue-based scaling
β€’ Cache: Redis cluster
Target: 100k+ docs, 1k+ QPS"]:::monitor PERF_NOTE["⚑ PERFORMANCE TARGETS:
β€’ Query latency: <3s (p95)
β€’ Vector search: <100ms
β€’ LLM generation: <2s
β€’ Cache hit rate: >80%
β€’ Throughput: 1000 QPS"]:::cache QUALITY_NOTE["βœ… QUALITY ASSURANCE:
β€’ Re-ranking for precision
β€’ Source attribution
β€’ Confidence scoring
β€’ Fallback responses
β€’ Human feedback loop"]:::process SCALE_NOTE -.-> QDRANT PERF_NOTE -.-> REDIS_CACHE QUALITY_NOTE -.-> RERANKER ``` ### Pipeline RAG **1. Ingestion Pipeline (Offline)** - Parsing documentazione MkDocs - Chunking intelligente (512 token, overlap 128) - Generazione embeddings (all-MiniLM-L6-v2) - Storage in Vector Database (Qdrant cluster) **2. Query Pipeline (Real-time)** - Embedding della query utente - Hybrid search (semantic + keyword) - Re-ranking con cross-encoder - Context assembly per LLM **3. Generation** - LLM locale (Qwen) con RAG context - Source attribution automatica - Streaming delle risposte **4. Scaling Strategy** - Vector DB sharding automatico - API instances con auto-scaling K8s - Redis cluster per caching multi-livello - Load balancing con Nginx --- ## πŸ“§ Contatti - **Team**: Infrastructure Documentation Team - **Email**: infra-docs@company.com - **GitLab**: https://gitlab.com/company/infra-docs-automation --- **Versione**: 1.0.0 **Ultimo aggiornamento**: 2025-10-28