Initial commit: LLM Automation Docs & Remediation Engine v2.0

Features: - Automated datacenter documentation generation - MCP integration for device connectivity - Auto-remediation engine with safety checks - Multi-factor reliability scoring (0-100%) - Human feedback learning loop - Pattern recognition and continuous improvement - Agentic chat support with AI - API for ticket resolution - Frontend React with Material-UI - CI/CD pipelines (GitLab + Gitea) - Docker & Kubernetes deployment - Complete documentation and guides v2.0 Highlights: - Auto-remediation with write operations (disabled by default) - Reliability calculator with 4-factor scoring - Human feedback system for continuous learning - Pattern-based progressive automation - Approval workflow for critical actions - Full audit trail and rollback capability
2025-10-17 23:47:28 +00:00
commit 1ba5ce851d
89 changed files with 20468 additions and 0 deletions
--- a/QUICK_START.md
+++ b/QUICK_START.md
@@ -0,0 +1,285 @@
+# Guida Rapida - Sistema Documentazione Datacenter Automatizzata
+
+## 📋 Panoramica
+
+Questo sistema permette la generazione automatica e l'aggiornamento della documentazione del datacenter tramite un LLM.
+
+## 🎯 Cosa Contiene
+
+### 📁 templates/ (10 file)
+Template markdown per ogni sezione documentale:
+- `01_infrastruttura_fisica.md` - Layout, elettrico, cooling, sicurezza fisica
+- `02_networking.md` - Switch, router, firewall, VLAN, DNS/DHCP
+- `03_server_virtualizzazione.md` - Host fisici, VM, cluster, container
+- `04_storage.md` - SAN, NAS, object storage, capacity planning
+- `05_sicurezza.md` - IAM, vulnerability, compliance, encryption
+- `06_backup_disaster_recovery.md` - Backup jobs, RPO/RTO, DR site
+- `07_monitoring_alerting.md` - Monitoring platform, alerts, dashboards
+- `08_database_middleware.md` - DBMS, instances, application servers
+- `09_procedure_operative.md` - SOP, runbook, escalation, change management
+- `10_miglioramenti.md` - Analisi opportunità di miglioramento
+
+### 📁 system-prompts/ (10 file)
+Prompt specifici per guidare l'LLM nella gestione di ogni sezione:
+- Definiscono il ruolo dell'LLM
+- Specificano le fonti dati
+- Forniscono istruzioni di compilazione
+- Indicano comandi e query da utilizzare
+
+### 📁 requirements/ (3 file)
+Requisiti tecnici per l'implementazione:
+- `llm_requirements.md` - Librerie, accessi, network, best practices
+- `data_collection_scripts.md` - Script Python per raccolta dati
+- `api_endpoints.md` - API calls, comandi CLI, SNMP OIDs
+
+## 🚀 Come Iniziare
+
+### 1. Setup Ambiente
+```bash
+# Clone/copia il progetto
+cd /opt/datacenter-docs
+
+# Crea virtual environment
+python3 -m venv venv
+source venv/bin/activate
+
+# Installa dipendenze
+pip install -r requirements.txt
+```
+
+### 2. Configura Credenziali
+```yaml
+# Edita config.yaml
+databases:
+  asset_db:
+    host: your-db.local
+    user: readonly_user
+    password: ${VAULT:password}
+
+vmware:
+  vcenter_host: vcenter.local
+  username: automation@vsphere.local
+  password: ${VAULT:password}
+```
+
+### 3. Test Connettività
+```bash
+# Verifica accesso ai sistemi
+python3 main.py --dry-run --debug
+
+# Test singola sezione
+python3 main.py --section 01 --dry-run
+```
+
+### 4. Prima Generazione
+```bash
+# Genera tutta la documentazione
+python3 main.py
+
+# Output in: output/section_XX.md
+```
+
+## 🔄 Workflow Operativo
+
+### Aggiornamento Automatico
+```bash
+# Configura cron per aggiornamenti periodici
+# Ogni 6 ore
+0 */6 * * * cd /opt/datacenter-docs && venv/bin/python main.py
+
+# Weekly report completo
+0 2 * * 0 cd /opt/datacenter-docs && venv/bin/python main.py --full
+```
+
+### Aggiornamento Manuale
+```bash
+# Specifica sezione
+python3 main.py --section 02
+
+# Debug mode
+python3 main.py --debug
+
+# Dry run (test senza salvare)
+python3 main.py --dry-run
+```
+
+## 📊 Struttura Output
+
+```
+output/
+├── section_01.md  # Infrastruttura fisica
+├── section_02.md  # Networking
+├── section_03.md  # Server e virtualizzazione
+├── section_04.md  # Storage
+├── section_05.md  # Sicurezza
+├── section_06.md  # Backup e DR
+├── section_07.md  # Monitoring
+├── section_08.md  # Database e middleware
+├── section_09.md  # Procedure operative
+└── section_10.md  # Miglioramenti
+```
+
+## ⚙️ Personalizzazione
+
+### Adattare i Template
+1. Modifica `templates/XX_nome_sezione.md`
+2. Aggiungi/rimuovi sezioni secondo necessità
+3. Mantieni i placeholder `[NOME_CAMPO]`
+
+### Modificare System Prompts
+1. Edita `system-prompts/XX_nome_sezione_prompt.md`
+2. Aggiungi comandi specifici per il tuo ambiente
+3. Aggiorna priorità e focus
+
+### Aggiungere Fonti Dati
+1. Implementa nuovo collector in `collectors/`
+2. Aggiorna `config.yaml` con endpoint
+3. Aggiungi test in `tests/`
+
+## 🔒 Security Best Practices
+
+### Credenziali
+- ✅ **USA**: Vault (HashiCorp Vault, AWS Secrets Manager)
+- ✅ **USA**: Environment variables con encryption
+- ❌ **MAI**: Hardcode password in script
+- ❌ **MAI**: Commit credentials in git
+
+### Permessi Account
+- ✅ Account automation dedicato
+- ✅ Permessi read-only dove possibile
+- ✅ MFA quando supportato
+- ✅ Audit logging abilitato
+
+### Network Security
+- ✅ Accesso solo a management networks
+- ✅ Firewall rules specifiche
+- ✅ VPN/bastion host se necessario
+
+## 📈 Monitoring
+
+### Log Files
+```bash
+# Application logs
+tail -f /var/log/datacenter-docs/generation.log
+
+# Cron execution logs
+tail -f /var/log/datacenter-docs/cron.log
+
+# Error logs
+grep ERROR /var/log/datacenter-docs/*.log
+```
+
+### Health Checks
+```bash
+# Verifica ultima generazione
+ls -lh output/
+
+# Check token count
+for f in output/*.md; do 
+  echo "$f: $(wc -c < $f | awk '{print int($1/4)}') tokens"
+done
+
+# Verifica placeholder non sostituiti
+grep -r '\[.*\]' output/
+```
+
+## 🐛 Troubleshooting
+
+### Issue: Connection Timeout
+```bash
+# Test connectivity
+ping -c 3 vcenter.local
+telnet vcenter.local 443
+
+# Check firewall
+sudo iptables -L -n | grep <IP>
+```
+
+### Issue: Authentication Failed
+```bash
+# Verify credentials
+python3 -c "from collectors import VMwareCollector; VMwareCollector(config).test_connection()"
+
+# Check vault
+vault kv get datacenter/creds
+```
+
+### Issue: Token Limit Exceeded
+- Riduci retention dati storici
+- Rimuovi tabelle con troppi record
+- Sintetizza invece di listare tutto
+
+### Issue: Incomplete Data
+- Verifica cache redis: `redis-cli KEYS "*"`
+- Check source system availability
+- Review error logs
+
+## 📚 Risorse Utili
+
+### Documentazione Vendor
+- VMware vSphere API: https://developer.vmware.com/apis
+- Cisco DevNet: https://developer.cisco.com
+- Zabbix API: https://www.zabbix.com/documentation/current/api
+
+### Python Libraries
+- pyVmomi: https://github.com/vmware/pyvmomi
+- netmiko: https://github.com/ktbyers/netmiko
+- pysnmp: https://github.com/etingof/pysnmp
+
+## 🤝 Supporto
+
+### Team Contacts
+- **Automation Team**: automation@company.com
+- **Infrastructure Team**: infra@company.com
+- **Security Team**: security@company.com
+
+### Issue Reporting
+1. Check logs for errors
+2. Test connectivity to sources
+3. Open ticket con dettagli: timestamp, sezione, error message
+4. Fornire log relevanti
+
+## ✅ Checklist Deployment
+
+Prima di andare in produzione:
+
+- [ ] Virtual environment creato e attivato
+- [ ] Tutte le dipendenze installate (`pip install -r requirements.txt`)
+- [ ] File `config.yaml` configurato con endpoint corretti
+- [ ] Credenziali in vault/secrets manager
+- [ ] Test connettività a tutti i sistemi (VMware, network, storage, etc.)
+- [ ] Firewall rules approvate e implementate
+- [ ] Account automation con permessi appropriati
+- [ ] Test dry-run completato con successo
+- [ ] Logging configurato
+- [ ] Notifiche email/Slack configurate
+- [ ] Cron job configurato
+- [ ] Documentazione runbook operativo completata
+- [ ] Team formato sull'uso del sistema
+- [ ] Escalation path definito
+
+## 📝 Note Finali
+
+### Limiti dei Token
+Ogni sezione è limitata a 50.000 token (~200KB di testo). Se superi il limite:
+- Riduce dettaglio tabelle storiche
+- Aggrega dati vecchi
+- Sintetizza invece di elencare
+
+### Frequenza Aggiornamenti
+Raccomandato:
+- **Prod**: Ogni 6 ore
+- **Metrics only**: Ogni 1 ora
+- **Full report**: Settimanale
+
+### Backup Documentazione
+```bash
+# Backup automatico prima di aggiornare
+tar -czf backup/docs-$(date +%Y%m%d).tar.gz output/
+```
+
+---
+
+**Versione**: 1.0  
+**Data**: 2025-01-XX  
+**Maintainer**: Automation Team