Features: - Automated datacenter documentation generation - MCP integration for device connectivity - Auto-remediation engine with safety checks - Multi-factor reliability scoring (0-100%) - Human feedback learning loop - Pattern recognition and continuous improvement - Agentic chat support with AI - API for ticket resolution - Frontend React with Material-UI - CI/CD pipelines (GitLab + Gitea) - Docker & Kubernetes deployment - Complete documentation and guides v2.0 Highlights: - Auto-remediation with write operations (disabled by default) - Reliability calculator with 4-factor scoring - Human feedback system for continuous learning - Pattern-based progressive automation - Approval workflow for critical actions - Full audit trail and rollback capability
13 KiB
13 KiB
Requisiti Tecnici per LLM - Generazione Documentazione Datacenter
1. Capacità Richieste al LLM
1.1 Capabilities Fondamentali
- Network Access: Connessioni SSH, HTTPS, SNMP
- API Interaction: REST, SOAP, GraphQL
- Code Execution: Python, Bash, PowerShell
- File Operations: Lettura/scrittura file markdown
- Database Access: MySQL, PostgreSQL, SQL Server
1.2 Librerie Python Richieste
# Networking e protocolli
pip install paramiko # SSH connections
pip install pysnmp # SNMP queries
pip install requests # HTTP/REST APIs
pip install netmiko # Network device automation
# Virtualizzazione
pip install pyvmomi # VMware vSphere API
pip install proxmoxer # Proxmox API
pip install libvirt-python # KVM/QEMU
# Storage
pip install pure-storage # Pure Storage API
pip install netapp-ontap # NetApp API
# Database
pip install mysql-connector-python
pip install psycopg2 # PostgreSQL
pip install pymssql # Microsoft SQL Server
# Monitoring
pip install zabbix-api # Zabbix
pip install prometheus-client # Prometheus
# Cloud providers
pip install boto3 # AWS
pip install azure-mgmt # Azure
pip install google-cloud # GCP
# Utilities
pip install jinja2 # Template rendering
pip install pyyaml # YAML parsing
pip install pandas # Data analysis
pip install markdown # Markdown generation
1.3 CLI Tools Required
# Network tools
apt-get install snmp snmp-mibs-downloader
apt-get install nmap
apt-get install netcat-openbsd
# Virtualization
apt-get install open-vm-tools # VMware
# Monitoring
apt-get install nagios-plugins
# Storage
apt-get install nfs-common
apt-get install cifs-utils
apt-get install multipath-tools
# Database clients
apt-get install mysql-client
apt-get install postgresql-client
2. Accessi e Credenziali Necessarie
2.1 Formato Credenziali
Le credenziali devono essere fornite in un file sicuro (vault/encrypted):
# credentials.yaml (encrypted)
datacenter:
# Network devices
network:
cisco_switches:
username: admin
password: ${ENCRYPTED}
enable_password: ${ENCRYPTED}
firewalls:
api_key: ${ENCRYPTED}
# Virtualization
vmware:
vcenter_host: vcenter.domain.local
username: automation@vsphere.local
password: ${ENCRYPTED}
proxmox:
host: proxmox.domain.local
token_name: automation
token_value: ${ENCRYPTED}
# Storage
storage_arrays:
- name: SAN-01
type: pure_storage
api_token: ${ENCRYPTED}
# Databases
databases:
asset_management:
host: db.domain.local
port: 3306
username: readonly_user
password: ${ENCRYPTED}
database: asset_db
# Monitoring
monitoring:
zabbix:
url: https://zabbix.domain.local
api_token: ${ENCRYPTED}
# Backup
backup:
veeam:
server: veeam.domain.local
username: automation
password: ${ENCRYPTED}
2.2 Permessi Minimi Richiesti
IMPORTANTE: Utilizzare SEMPRE account a permessi minimi (read-only dove possibile)
| Sistema | Account Type | Permessi Richiesti |
|---|---|---|
| Network Devices | Read-only | show commands, SNMP read |
| VMware vCenter | Read-only | Global > Read-only role |
| Storage Arrays | Read-only | Monitoring/reporting access |
| Databases | SELECT only | Read access su schema asset |
| Monitoring | Read-only | View dashboards, metrics |
| Backup Software | Read-only | View jobs, reports |
3. Connettività di Rete
3.1 Requisiti Rete
LLM Host deve poter raggiungere:
Management Network:
- VLAN 10: 10.0.10.0/24 (Infrastructure Management)
- VLAN 20: 10.0.20.0/24 (Server Management)
- VLAN 30: 10.0.30.0/24 (Storage Management)
Porte richieste:
- TCP 22 (SSH)
- TCP 443 (HTTPS)
- TCP 3306 (MySQL)
- TCP 5432 (PostgreSQL)
- TCP 1433 (MS SQL Server)
- UDP 161 (SNMP)
- TCP 8006 (Proxmox)
3.2 Firewall Rules
# Allow LLM host to management networks
Source: [LLM_HOST_IP]
Destination: Management Networks
Protocol: SSH, HTTPS, SNMP, Database ports
Action: ALLOW
# Deny all other traffic from LLM host
Source: [LLM_HOST_IP]
Destination: Production Networks
Action: DENY
4. Rate Limiting e Best Practices
4.1 API Call Limits
# Rispettare rate limits dei vendor
RATE_LIMITS = {
'vmware_vcenter': {'calls_per_minute': 100},
'network_devices': {'calls_per_minute': 10},
'storage_api': {'calls_per_minute': 60},
'monitoring_api': {'calls_per_minute': 300}
}
# Implementare retry logic con exponential backoff
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
time.sleep(delay)
return wrapper
return decorator
4.2 Concurrent Operations
# Limitare operazioni concorrenti
from concurrent.futures import ThreadPoolExecutor
MAX_WORKERS = 5 # Non saturare le risorse
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
futures = [executor.submit(query_device, device) for device in devices]
results = [f.result() for f in futures]
5. Error Handling e Logging
5.1 Logging Configuration
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/var/log/datacenter-docs/generation.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger('datacenter-docs')
5.2 Error Handling Strategy
class DataCollectionError(Exception):
"""Custom exception per errori di raccolta dati"""
pass
try:
data = collect_vmware_data()
except ConnectionError as e:
logger.error(f"Cannot connect to vCenter: {e}")
# Utilizzare dati cached se disponibili
data = load_cached_data('vmware')
except AuthenticationError as e:
logger.critical(f"Authentication failed: {e}")
# Inviare alert al team
send_alert("VMware auth failed")
except Exception as e:
logger.exception(f"Unexpected error: {e}")
# Continuare con dati parziali
data = get_partial_data()
6. Caching e Performance
6.1 Cache Strategy
import redis
from datetime import timedelta
# Setup Redis per caching
cache = redis.Redis(host='localhost', port=6379, db=0)
def get_cached_or_fetch(key, fetch_function, ttl=3600):
"""Get from cache or fetch if not available"""
cached = cache.get(key)
if cached:
logger.info(f"Cache hit for {key}")
return json.loads(cached)
logger.info(f"Cache miss for {key}, fetching...")
data = fetch_function()
cache.setex(key, ttl, json.dumps(data))
return data
# Esempio uso
vmware_inventory = get_cached_or_fetch(
'vmware_inventory',
lambda: collect_vmware_inventory(),
ttl=3600 # 1 hour
)
6.2 Dati da Cachare
- 1 ora: Performance metrics, status real-time
- 6 ore: Inventory, configurazioni
- 24 ore: Asset database, ownership info
- 7 giorni: Historical trends, capacity planning
7. Schedule di Esecuzione
7.1 Cron Schedule Raccomandato
# Aggiornamento documentazione completa - ogni 6 ore
0 */6 * * * /usr/local/bin/generate-datacenter-docs.sh --full
# Quick update (solo metrics) - ogni ora
0 * * * * /usr/local/bin/generate-datacenter-docs.sh --metrics-only
# Weekly comprehensive report - domenica notte
0 2 * * 0 /usr/local/bin/generate-datacenter-docs.sh --full --detailed
7.2 Script Wrapper Esempio
#!/bin/bash
# generate-datacenter-docs.sh
set -e
LOGFILE="/var/log/datacenter-docs/$(date +%Y%m%d_%H%M%S).log"
LOCKFILE="/var/run/datacenter-docs.lock"
# Prevent concurrent executions
if [ -f "$LOCKFILE" ]; then
echo "Another instance is running. Exiting."
exit 1
fi
touch "$LOCKFILE"
trap "rm -f $LOCKFILE" EXIT
# Activate virtual environment
source /opt/datacenter-docs/venv/bin/activate
# Run Python script with parameters
python3 /opt/datacenter-docs/main.py "$@" 2>&1 | tee -a "$LOGFILE"
# Cleanup old logs (keep 30 days)
find /var/log/datacenter-docs/ -name "*.log" -mtime +30 -delete
8. Output e Validazione
8.1 Post-Generation Checks
def validate_documentation(section_file):
"""Valida il documento generato"""
checks = {
'file_exists': os.path.exists(section_file),
'not_empty': os.path.getsize(section_file) > 0,
'valid_markdown': validate_markdown_syntax(section_file),
'no_placeholders': not contains_placeholders(section_file),
'token_limit': count_tokens(section_file) < 50000
}
if all(checks.values()):
logger.info(f"✓ {section_file} validation passed")
return True
else:
failed = [k for k, v in checks.items() if not v]
logger.error(f"✗ {section_file} validation failed: {failed}")
return False
def contains_placeholders(file_path):
"""Check per placeholders non sostituiti"""
with open(file_path, 'r') as f:
content = f.read()
patterns = [r'\[.*?\]', r'\{.*?\}', r'TODO', r'FIXME']
import re
return any(re.search(p, content) for p in patterns)
8.2 Notification System
def send_completion_notification(success, sections_updated, errors):
"""Invia notifica a fine generazione"""
message = f"""
Datacenter Documentation Update
Status: {'✓ SUCCESS' if success else '✗ FAILED'}
Sections Updated: {', '.join(sections_updated)}
Errors: {len(errors)}
{'Errors:\n' + '\n'.join(errors) if errors else ''}
Timestamp: {datetime.now().isoformat()}
"""
# Send via multiple channels
send_email(recipients=['ops-team@company.com'], subject='Doc Update', body=message)
send_slack(channel='#datacenter-ops', message=message)
# send_teams / send_webhook as needed
9. Security Considerations
9.1 Secrets Management
# NON salvare mai credenziali in chiaro
# Utilizzare sempre un vault
from cryptography.fernet import Fernet
import keyring
def get_credential(service, account):
"""Retrieve credential from OS keyring"""
return keyring.get_password(service, account)
# Oppure HashiCorp Vault
import hvac
client = hvac.Client(url='https://vault.company.com')
client.auth.approle.login(role_id=ROLE_ID, secret_id=SECRET_ID)
credentials = client.secrets.kv.v2.read_secret_version(path='datacenter/creds')
9.2 Audit Trail
# Log TUTTE le operazioni per audit
audit_log = {
'timestamp': datetime.now().isoformat(),
'user': 'automation-account',
'action': 'documentation_generation',
'sections': sections_updated,
'systems_accessed': list_of_systems,
'duration': elapsed_time,
'success': True/False
}
write_audit_log(audit_log)
10. Troubleshooting
10.1 Common Issues
| Problema | Causa Probabile | Soluzione |
|---|---|---|
| Connection Timeout | Firewall/Network | Verificare connectivity, firewall rules |
| Authentication Failed | Credenziali errate/scadute | Ruotare credenziali, verificare vault |
| API Rate Limit | Troppe richieste | Implementare backoff, ridurre frequency |
| Incomplete Data | Source temporaneamente down | Usare cached data, generare partial doc |
| Token Limit Exceeded | Troppi dati in sezione | Rimuovere dati storici, ottimizzare formato |
10.2 Debug Mode
# Abilitare debug per troubleshooting
DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
if DEBUG:
logging.getLogger().setLevel(logging.DEBUG)
# Salvare raw responses per analisi
with open(f'debug_{timestamp}.json', 'w') as f:
json.dump(raw_response, f, indent=2)
11. Testing
11.1 Unit Tests
import unittest
class TestDataCollection(unittest.TestCase):
def test_vmware_connection(self):
"""Test connessione a vCenter"""
result = test_vmware_connection()
self.assertTrue(result.success)
def test_data_validation(self):
"""Test validazione dati raccolti"""
sample_data = load_sample_data()
self.assertTrue(validate_data_structure(sample_data))
11.2 Integration Tests
# Test end-to-end in ambiente di test
./run-tests.sh --integration --environment=test
# Verificare che tutti i sistemi siano raggiungibili
./check-connectivity.sh
# Dry-run senza salvare
python3 main.py --dry-run --verbose
Checklist Pre-Deployment
Prima di mettere in produzione il sistema:
- Tutte le librerie installate
- Credenziali configurate in vault sicuro
- Connectivity verificata verso tutti i sistemi
- Permessi account automation validati (read-only)
- Firewall rules approvate e configurate
- Logging configurato e testato
- Notification system testato
- Cron jobs configurati
- Backup documentazione esistente
- Runbook operativo completato
- Escalation path definito
- DR procedure documentate
Documento Versione: 1.0
Ultimo Aggiornamento: 2025-01-XX
Owner: Automation Team