Initial commit: LLM Automation Docs & Remediation Engine v2.0

Features:
- Automated datacenter documentation generation
- MCP integration for device connectivity
- Auto-remediation engine with safety checks
- Multi-factor reliability scoring (0-100%)
- Human feedback learning loop
- Pattern recognition and continuous improvement
- Agentic chat support with AI
- API for ticket resolution
- Frontend React with Material-UI
- CI/CD pipelines (GitLab + Gitea)
- Docker & Kubernetes deployment
- Complete documentation and guides

v2.0 Highlights:
- Auto-remediation with write operations (disabled by default)
- Reliability calculator with 4-factor scoring
- Human feedback system for continuous learning
- Pattern-based progressive automation
- Approval workflow for critical actions
- Full audit trail and rollback capability
This commit is contained in:
LLM Automation System
2025-10-17 23:47:28 +00:00
commit 1ba5ce851d
89 changed files with 20468 additions and 0 deletions

View File

@@ -0,0 +1,687 @@
# API Endpoints e Comandi per Raccolta Dati
## 1. VMware vSphere API
### 1.1 REST API Endpoints
```bash
# Base URL
BASE_URL="https://vcenter.domain.local/rest"
# Authentication
curl -X POST $BASE_URL/com/vmware/cis/session \
-u 'automation@vsphere.local:password'
# Get all VMs
curl -X GET $BASE_URL/vcenter/vm \
-H "vmware-api-session-id: ${SESSION_ID}"
# Get VM details
curl -X GET $BASE_URL/vcenter/vm/${VM_ID} \
-H "vmware-api-session-id: ${SESSION_ID}"
# Get hosts
curl -X GET $BASE_URL/vcenter/host \
-H "vmware-api-session-id: ${SESSION_ID}"
# Get datastores
curl -X GET $BASE_URL/vcenter/datastore \
-H "vmware-api-session-id: ${SESSION_ID}"
# Get clusters
curl -X GET $BASE_URL/vcenter/cluster \
-H "vmware-api-session-id: ${SESSION_ID}"
```
### 1.2 PowerCLI Commands
```powershell
# Connect
Connect-VIServer -Server vcenter.domain.local -User automation@vsphere.local
# Get all VMs with details
Get-VM | Select-Object Name, PowerState, NumCpu, MemoryGB, @{N='UsedSpaceGB';E={[math]::Round($_.UsedSpaceGB,2)}}, VMHost, ResourcePool | Export-Csv -Path vms.csv
# Get hosts
Get-VMHost | Select-Object Name, ConnectionState, PowerState, Version, NumCpu, MemoryTotalGB, @{N='MemoryUsageGB';E={[math]::Round($_.MemoryUsageGB,2)}} | Export-Csv -Path hosts.csv
# Get datastores
Get-Datastore | Select-Object Name, Type, CapacityGB, FreeSpaceGB, @{N='PercentFree';E={[math]::Round(($_.FreeSpaceGB/$_.CapacityGB*100),2)}} | Export-Csv -Path datastores.csv
# Get performance stats
Get-Stat -Entity (Get-VM) -Stat cpu.usage.average,mem.usage.average -Start (Get-Date).AddDays(-7) -IntervalMins 5 | Export-Csv -Path performance.csv
```
---
## 2. Proxmox VE API
### 2.1 REST API
```bash
# Base URL
PROXMOX_URL="https://proxmox.domain.local:8006/api2/json"
# Get ticket (authentication)
curl -k -d "username=automation@pam&password=password" \
$PROXMOX_URL/access/ticket
# Get nodes
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
$PROXMOX_URL/nodes
# Get VMs on node
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
$PROXMOX_URL/nodes/${NODE}/qemu
# Get containers
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
$PROXMOX_URL/nodes/${NODE}/lxc
# Get storage
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
$PROXMOX_URL/nodes/${NODE}/storage
# Get cluster status
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
$PROXMOX_URL/cluster/status
```
### 2.2 CLI Commands
```bash
# List VMs
pvesh get /cluster/resources --type vm
# VM status
qm status ${VMID}
# Container list
pct list
# Storage info
pvesm status
# Node info
pvesh get /nodes/${NODE}/status
```
---
## 3. Network Devices
### 3.1 Cisco IOS Commands
```bash
# Via SSH
ssh admin@switch.domain.local
# System information
show version
show inventory
show running-config
# Interfaces
show interfaces status
show interfaces description
show interfaces counters errors
show ip interface brief
# VLANs
show vlan brief
show vlan id ${VLAN_ID}
# Spanning Tree
show spanning-tree summary
show spanning-tree root
# Routing
show ip route
show ip protocols
# CDP/LLDP
show cdp neighbors detail
show lldp neighbors
# Performance
show processes cpu history
show memory statistics
show environment all
```
### 3.2 HP/Aruba Switch Commands
```bash
# System info
show system
show version
show running-config
# Interfaces
show interfaces brief
show interfaces status
# VLANs
show vlans
# Spanning tree
show spanning-tree
# Logging
show log
```
---
## 4. Firewall APIs
### 4.1 pfSense/OPNsense API
```bash
# Base URL
FW_URL="https://firewall.domain.local/api"
# Get system info
curl -X GET "${FW_URL}/core/system/status" \
-H "Authorization: Bearer ${API_TOKEN}"
# Get interfaces
curl -X GET "${FW_URL}/interfaces/overview/export" \
-H "Authorization: Bearer ${API_TOKEN}"
# Get firewall rules
curl -X GET "${FW_URL}/firewall/filter/searchRule" \
-H "Authorization: Bearer ${API_TOKEN}"
# Get VPN status
curl -X GET "${FW_URL}/ipsec/sessions" \
-H "Authorization: Bearer ${API_TOKEN}"
```
### 4.2 Fortinet FortiGate API
```bash
# Base URL
FORTI_URL="https://fortigate.domain.local/api/v2"
# System status
curl -X GET "${FORTI_URL}/monitor/system/status" \
-H "Authorization: Bearer ${API_TOKEN}"
# Interface stats
curl -X GET "${FORTI_URL}/monitor/system/interface/select" \
-H "Authorization: Bearer ${API_TOKEN}"
# Firewall policies
curl -X GET "${FORTI_URL}/cmdb/firewall/policy" \
-H "Authorization: Bearer ${API_TOKEN}"
# VPN status
curl -X GET "${FORTI_URL}/monitor/vpn/ipsec" \
-H "Authorization: Bearer ${API_TOKEN}"
```
---
## 5. Storage Arrays
### 5.1 Pure Storage API
```bash
# Base URL
PURE_URL="https://array.domain.local/api"
# Get array info
curl -X GET "${PURE_URL}/1.19/array" \
-H "api-token: ${API_TOKEN}"
# Get volumes
curl -X GET "${PURE_URL}/1.19/volume" \
-H "api-token: ${API_TOKEN}"
# Get hosts
curl -X GET "${PURE_URL}/1.19/host" \
-H "api-token: ${API_TOKEN}"
# Get performance metrics
curl -X GET "${PURE_URL}/1.19/array/monitor?action=monitor" \
-H "api-token: ${API_TOKEN}"
```
### 5.2 NetApp ONTAP API
```bash
# Base URL
NETAPP_URL="https://netapp.domain.local/api"
# Get cluster info
curl -X GET "${NETAPP_URL}/cluster" \
-u "admin:password"
# Get volumes
curl -X GET "${NETAPP_URL}/storage/volumes" \
-u "admin:password"
# Get aggregates
curl -X GET "${NETAPP_URL}/storage/aggregates" \
-u "admin:password"
# Get performance
curl -X GET "${NETAPP_URL}/cluster/counter/tables/volume" \
-u "admin:password"
```
### 5.3 Generic SAN Commands
```bash
# Via SSH to array management interface
# Show system info
show system
show controller
show disk
# Show volumes/LUNs
show volumes
show luns
show mappings
# Show performance
show statistics
show disk-statistics
```
---
## 6. Monitoring Systems
### 6.1 Zabbix API
```bash
# Base URL
ZABBIX_URL="https://zabbix.domain.local/api_jsonrpc.php"
# Authenticate
curl -X POST $ZABBIX_URL \
-H "Content-Type: application/json-rpc" \
-d '{
"jsonrpc": "2.0",
"method": "user.login",
"params": {
"user": "automation",
"password": "password"
},
"id": 1
}'
# Get hosts
curl -X POST $ZABBIX_URL \
-H "Content-Type: application/json-rpc" \
-d '{
"jsonrpc": "2.0",
"method": "host.get",
"params": {
"output": ["hostid", "host", "status"]
},
"auth": "'${AUTH_TOKEN}'",
"id": 1
}'
# Get problems
curl -X POST $ZABBIX_URL \
-H "Content-Type: application/json-rpc" \
-d '{
"jsonrpc": "2.0",
"method": "problem.get",
"params": {
"recent": true
},
"auth": "'${AUTH_TOKEN}'",
"id": 1
}'
```
### 6.2 Prometheus API
```bash
# Base URL
PROM_URL="http://prometheus.domain.local:9090"
# Query instant
curl -X GET "${PROM_URL}/api/v1/query?query=up"
# Query range
curl -X GET "${PROM_URL}/api/v1/query_range?query=node_cpu_seconds_total&start=2024-01-01T00:00:00Z&end=2024-01-02T00:00:00Z&step=15s"
# Get targets
curl -X GET "${PROM_URL}/api/v1/targets"
# Get alerts
curl -X GET "${PROM_URL}/api/v1/alerts"
```
### 6.3 Nagios/Icinga API
```bash
# Icinga2 API
ICINGA_URL="https://icinga.domain.local:5665"
# Get hosts
curl -k -u "automation:password" \
"${ICINGA_URL}/v1/objects/hosts"
# Get services
curl -k -u "automation:password" \
"${ICINGA_URL}/v1/objects/services"
# Get problems
curl -k -u "automation:password" \
"${ICINGA_URL}/v1/objects/services?filter=service.state!=0"
```
---
## 7. Backup Systems
### 7.1 Veeam API
```powershell
# Connect to Veeam server
Connect-VBRServer -Server veeam.domain.local -User automation
# Get backup jobs
Get-VBRJob | Select-Object Name, JobType, IsScheduleEnabled, LastResult
# Get backup sessions
Get-VBRBackupSession | Where-Object {$_.CreationTime -gt (Get-Date).AddDays(-7)} | Select-Object Name, JobName, Result, CreationTime
# Get restore points
Get-VBRRestorePoint | Select-Object VMName, CreationTime, Type
# Get repositories
Get-VBRBackupRepository | Select-Object Name, Path, @{N='FreeGB';E={[math]::Round($_.GetContainer().CachedFreeSpace.InGigabytes,2)}}
```
### 7.2 CommVault API
```bash
# Base URL
CV_URL="https://commvault.domain.local/webconsole/api"
# Login
curl -X POST "${CV_URL}/Login" \
-H "Content-Type: application/json" \
-d '{"username":"automation","password":"password"}'
# Get jobs
curl -X GET "${CV_URL}/Job?clientName=${CLIENT}" \
-H "Authtoken: ${TOKEN}"
# Get clients
curl -X GET "${CV_URL}/Client" \
-H "Authtoken: ${TOKEN}"
```
---
## 8. Database Queries
### 8.1 Asset Management DB
```sql
-- MySQL/MariaDB queries for asset database
-- Get all racks
SELECT
rack_id,
location,
total_units,
occupied_units,
(total_units - occupied_units) AS available_units,
max_power_kw,
ROUND(occupied_units * 100.0 / total_units, 2) AS utilization_percent
FROM racks
ORDER BY location, rack_id;
-- Get all servers
SELECT
s.hostname,
s.serial_number,
s.model,
s.cpu_model,
s.cpu_cores,
s.ram_gb,
s.rack_id,
s.rack_unit,
s.status,
s.environment
FROM servers s
ORDER BY s.rack_id, s.rack_unit;
-- Get network devices
SELECT
n.hostname,
n.device_type,
n.vendor,
n.model,
n.management_ip,
n.firmware_version,
n.rack_id,
n.status
FROM network_devices n
ORDER BY n.device_type, n.hostname;
-- Get contracts
SELECT
c.vendor,
c.service_type,
c.contract_type,
c.start_date,
c.end_date,
DATEDIFF(c.end_date, NOW()) AS days_to_expiry,
c.annual_cost
FROM contracts c
WHERE c.end_date > NOW()
ORDER BY c.end_date;
```
### 8.2 Database Server Queries
```sql
-- MySQL - Database sizes
SELECT
table_schema AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024 / 1024, 2) AS 'Size_GB'
FROM information_schema.tables
GROUP BY table_schema
ORDER BY SUM(data_length + index_length) DESC;
-- PostgreSQL - Database sizes
SELECT
datname AS database_name,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
-- SQL Server - Database sizes
SELECT
DB_NAME(database_id) AS DatabaseName,
(size * 8.0 / 1024) AS SizeMB
FROM sys.master_files
WHERE type = 0
ORDER BY size DESC;
```
---
## 9. Cloud Provider APIs
### 9.1 AWS (Boto3)
```python
import boto3
# EC2 instances
ec2 = boto3.client('ec2')
instances = ec2.describe_instances()
# S3 buckets
s3 = boto3.client('s3')
buckets = s3.list_buckets()
# RDS databases
rds = boto3.client('rds')
databases = rds.describe_db_instances()
# Cost Explorer
ce = boto3.client('ce')
cost = ce.get_cost_and_usage(
TimePeriod={'Start': '2024-01-01', 'End': '2024-01-31'},
Granularity='MONTHLY',
Metrics=['UnblendedCost']
)
```
### 9.2 Azure (SDK)
```python
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.storage import StorageManagementClient
credential = DefaultAzureCredential()
# VMs
compute_client = ComputeManagementClient(credential, subscription_id)
vms = compute_client.virtual_machines.list_all()
# Storage accounts
storage_client = StorageManagementClient(credential, subscription_id)
storage_accounts = storage_client.storage_accounts.list()
```
---
## 10. SNMP OIDs Reference
### 10.1 Common System OIDs
```bash
# System description
.1.3.6.1.2.1.1.1.0 # sysDescr
# System uptime
.1.3.6.1.2.1.1.3.0 # sysUpTime
# System name
.1.3.6.1.2.1.1.5.0 # sysName
# System location
.1.3.6.1.2.1.1.6.0 # sysLocation
```
### 10.2 UPS OIDs (RFC 1628)
```bash
# UPS identity
.1.3.6.1.2.1.33.1.1.1.0 # upsIdentManufacturer
.1.3.6.1.2.1.33.1.1.2.0 # upsIdentModel
# Battery status
.1.3.6.1.2.1.33.1.2.1.0 # upsBatteryStatus
.1.3.6.1.2.1.33.1.2.2.0 # upsSecondsOnBattery
.1.3.6.1.2.1.33.1.2.3.0 # upsEstimatedMinutesRemaining
# Input
.1.3.6.1.2.1.33.1.3.3.1.3 # upsInputVoltage
.1.3.6.1.2.1.33.1.3.3.1.4 # upsInputCurrent
.1.3.6.1.2.1.33.1.3.3.1.6 # upsInputTruePower
# Output
.1.3.6.1.2.1.33.1.4.4.1.2 # upsOutputVoltage
.1.3.6.1.2.1.33.1.4.4.1.3 # upsOutputCurrent
.1.3.6.1.2.1.33.1.4.4.1.4 # upsOutputPower
.1.3.6.1.2.1.33.1.4.4.1.5 # upsOutputPercentLoad
```
### 10.3 Network Interface OIDs
```bash
# Interface description
.1.3.6.1.2.1.2.2.1.2 # ifDescr
# Interface status
.1.3.6.1.2.1.2.2.1.8 # ifOperStatus
# Interface traffic
.1.3.6.1.2.1.2.2.1.10 # ifInOctets
.1.3.6.1.2.1.2.2.1.16 # ifOutOctets
# Interface errors
.1.3.6.1.2.1.2.2.1.14 # ifInErrors
.1.3.6.1.2.1.2.2.1.20 # ifOutErrors
```
---
## 11. Example Collection Script
### 11.1 Complete Data Collection
```bash
#!/bin/bash
# collect_all_data.sh - Orchestrate all data collection
OUTPUT_DIR="/tmp/datacenter-collection-$(date +%Y%m%d_%H%M%S)"
mkdir -p $OUTPUT_DIR
echo "Starting datacenter data collection..."
# VMware
echo "Collecting VMware data..."
python3 collect_vmware.py > $OUTPUT_DIR/vmware.json
# Network devices
echo "Collecting network configurations..."
./collect_network.sh > $OUTPUT_DIR/network.json
# Storage
echo "Collecting storage data..."
python3 collect_storage.py > $OUTPUT_DIR/storage.json
# Monitoring
echo "Collecting monitoring data..."
./collect_monitoring.sh > $OUTPUT_DIR/monitoring.json
# Databases
echo "Querying databases..."
mysql -h db.local -u reader -pPASS asset_db < queries.sql > $OUTPUT_DIR/asset_db.csv
# SNMP devices
echo "Polling SNMP devices..."
./poll_snmp.sh > $OUTPUT_DIR/snmp.json
echo "Collection complete. Data saved to: $OUTPUT_DIR"
tar -czf $OUTPUT_DIR.tar.gz $OUTPUT_DIR
```
---
## 12. Rate Limiting Reference
### 12.1 Vendor Rate Limits
| Vendor | Endpoint | Limit | Time Window |
|--------|----------|-------|-------------|
| VMware vCenter | REST API | 100 req | per minute |
| Zabbix | API | 300 req | per minute |
| Pure Storage | REST API | 60 req | per minute |
| Cisco DNA Center | API | 10 req | per second |
| AWS | API (varies) | 10-100 req | per second |
### 12.2 Retry Strategy
```python
import time
from functools import wraps
def rate_limited_retry(max_retries=3, backoff_factor=2):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except RateLimitException:
if attempt == max_retries - 1:
raise
wait_time = backoff_factor ** attempt
logger.warning(f"Rate limited. Waiting {wait_time}s before retry {attempt+1}/{max_retries}")
time.sleep(wait_time)
except Exception as e:
logger.error(f"Error: {e}")
raise
return wrapper
return decorator
```
---
**Documento Versione**: 1.0
**Ultimo Aggiornamento**: 2025-01-XX
**Maintainer**: Automation Team

View File

@@ -0,0 +1,663 @@
# Script di Raccolta Dati per Documentazione Datacenter
## 1. Script Python Principali
### 1.1 Main Orchestrator
```python
#!/usr/bin/env python3
"""
main.py - Orchestrator principale per generazione documentazione
"""
import sys
import argparse
import logging
from datetime import datetime
from pathlib import Path
# Import moduli custom
from collectors import (
InfrastructureCollector,
NetworkCollector,
VirtualizationCollector,
StorageCollector,
SecurityCollector,
BackupCollector,
MonitoringCollector,
DatabaseCollector,
ProcedureCollector,
ImprovementAnalyzer
)
from generators import DocumentationGenerator
from validators import DocumentValidator
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class DatacenterDocGenerator:
def __init__(self, config_file='config.yaml'):
self.config = self.load_config(config_file)
self.sections = []
def load_config(self, config_file):
"""Load configuration from YAML file"""
import yaml
with open(config_file, 'r') as f:
return yaml.safe_load(f)
def collect_data(self, section=None):
"""Collect data from all sources"""
collectors = {
'01': InfrastructureCollector(self.config),
'02': NetworkCollector(self.config),
'03': VirtualizationCollector(self.config),
'04': StorageCollector(self.config),
'05': SecurityCollector(self.config),
'06': BackupCollector(self.config),
'07': MonitoringCollector(self.config),
'08': DatabaseCollector(self.config),
'09': ProcedureCollector(self.config),
}
data = {}
sections_to_process = [section] if section else collectors.keys()
for section_id in sections_to_process:
try:
logger.info(f"Collecting data for section {section_id}")
collector = collectors.get(section_id)
if collector:
data[section_id] = collector.collect()
logger.info(f"✓ Section {section_id} data collected")
except Exception as e:
logger.error(f"✗ Failed to collect section {section_id}: {e}")
data[section_id] = None
return data
def generate_documentation(self, data):
"""Generate markdown documentation from collected data"""
generator = DocumentationGenerator(self.config)
for section_id, section_data in data.items():
if section_data:
try:
logger.info(f"Generating documentation for section {section_id}")
output_file = f"output/section_{section_id}.md"
generator.generate(section_id, section_data, output_file)
# Validate generated document
validator = DocumentValidator()
if validator.validate(output_file):
logger.info(f"✓ Section {section_id} generated and validated")
self.sections.append(section_id)
else:
logger.warning(f"⚠ Section {section_id} validation warnings")
except Exception as e:
logger.error(f"✗ Failed to generate section {section_id}: {e}")
# Generate improvement section based on all other sections
if len(self.sections) > 0:
logger.info("Analyzing for improvements...")
analyzer = ImprovementAnalyzer(self.config)
improvements = analyzer.analyze(data)
generator.generate('10', improvements, "output/section_10.md")
def run(self, section=None, dry_run=False):
"""Main execution flow"""
logger.info("=" * 60)
logger.info("Starting Datacenter Documentation Generation")
logger.info(f"Timestamp: {datetime.now().isoformat()}")
logger.info("=" * 60)
try:
# Collect data
data = self.collect_data(section)
if dry_run:
logger.info("DRY RUN - Data collection complete, skipping generation")
return True
# Generate documentation
self.generate_documentation(data)
logger.info("=" * 60)
logger.info(f"✓ Documentation generation completed successfully")
logger.info(f"Sections updated: {', '.join(self.sections)}")
logger.info("=" * 60)
return True
except Exception as e:
logger.exception(f"Fatal error during documentation generation: {e}")
return False
def main():
parser = argparse.ArgumentParser(description='Generate Datacenter Documentation')
parser.add_argument('--section', help='Generate specific section only (01-10)')
parser.add_argument('--dry-run', action='store_true', help='Collect data without generating docs')
parser.add_argument('--config', default='config.yaml', help='Configuration file path')
parser.add_argument('--debug', action='store_true', help='Enable debug logging')
args = parser.parse_args()
if args.debug:
logging.getLogger().setLevel(logging.DEBUG)
generator = DatacenterDocGenerator(args.config)
success = generator.run(section=args.section, dry_run=args.dry_run)
sys.exit(0 if success else 1)
if __name__ == '__main__':
main()
```
---
## 2. Collector Modules
### 2.1 Infrastructure Collector
```python
#!/usr/bin/env python3
"""
collectors/infrastructure.py - Raccolta dati infrastruttura fisica
"""
from dataclasses import dataclass
from typing import List, Dict
import requests
from pysnmp.hlapi import *
@dataclass
class UPSData:
id: str
model: str
power_kva: float
battery_capacity: float
autonomy_minutes: int
status: str
last_test: str
class InfrastructureCollector:
def __init__(self, config):
self.config = config
self.asset_db = self.connect_asset_db()
def connect_asset_db(self):
"""Connect to asset management database"""
import mysql.connector
return mysql.connector.connect(
host=self.config['databases']['asset_db']['host'],
user=self.config['databases']['asset_db']['user'],
password=self.config['databases']['asset_db']['password'],
database=self.config['databases']['asset_db']['database']
)
def collect_ups_data(self) -> List[UPSData]:
"""Collect UPS data via SNMP"""
ups_devices = self.config['infrastructure']['ups_devices']
ups_data = []
for ups in ups_devices:
try:
# Query UPS via SNMP
iterator = getCmd(
SnmpEngine(),
CommunityData(self.config['snmp']['community']),
UdpTransportTarget((ups['ip'], 161)),
ContextData(),
ObjectType(ObjectIdentity('UPS-MIB', 'upsIdentModel', 0)),
ObjectType(ObjectIdentity('UPS-MIB', 'upsBatteryStatus', 0)),
)
errorIndication, errorStatus, errorIndex, varBinds = next(iterator)
if errorIndication:
logger.error(f"SNMP error for {ups['id']}: {errorIndication}")
continue
# Parse SNMP response
model = str(varBinds[0][1])
status = str(varBinds[1][1])
ups_data.append(UPSData(
id=ups['id'],
model=model,
power_kva=ups.get('power_kva', 0),
battery_capacity=ups.get('battery_capacity', 0),
autonomy_minutes=ups.get('autonomy_minutes', 0),
status=status,
last_test=ups.get('last_test', 'N/A')
))
except Exception as e:
logger.error(f"Failed to collect UPS {ups['id']}: {e}")
return ups_data
def collect_rack_data(self) -> List[Dict]:
"""Collect rack inventory from asset database"""
cursor = self.asset_db.cursor(dictionary=True)
cursor.execute("""
SELECT
rack_id,
location,
total_units,
occupied_units,
max_power_kw
FROM racks
ORDER BY location, rack_id
""")
return cursor.fetchall()
def collect_environmental_sensors(self) -> List[Dict]:
"""Collect temperature/humidity sensor data"""
sensors_api = self.config['infrastructure']['sensors_api']
response = requests.get(
f"{sensors_api}/api/sensors/current",
timeout=10
)
response.raise_for_status()
return response.json()
def collect(self) -> Dict:
"""Main collection method"""
return {
'ups_systems': self.collect_ups_data(),
'racks': self.collect_rack_data(),
'environmental': self.collect_environmental_sensors(),
'cooling': self.collect_cooling_data(),
'power_distribution': self.collect_pdu_data(),
'timestamp': datetime.now().isoformat()
}
```
### 2.2 Network Collector
```python
#!/usr/bin/env python3
"""
collectors/network.py - Raccolta configurazioni networking
"""
from netmiko import ConnectHandler
import paramiko
class NetworkCollector:
def __init__(self, config):
self.config = config
def connect_device(self, device_config):
"""SSH connection to network device"""
return ConnectHandler(
device_type=device_config['type'],
host=device_config['host'],
username=device_config['username'],
password=device_config['password'],
secret=device_config.get('enable_password')
)
def collect_switch_inventory(self) -> List[Dict]:
"""Collect switch inventory and configuration"""
switches = []
for switch_config in self.config['network']['switches']:
try:
connection = self.connect_device(switch_config)
# Collect basic info
version = connection.send_command('show version')
interfaces = connection.send_command('show interfaces status')
vlan = connection.send_command('show vlan brief')
switches.append({
'hostname': switch_config['hostname'],
'version': self.parse_version(version),
'interfaces': self.parse_interfaces(interfaces),
'vlans': self.parse_vlans(vlan),
})
connection.disconnect()
except Exception as e:
logger.error(f"Failed to collect {switch_config['hostname']}: {e}")
return switches
def collect_firewall_rules(self) -> Dict:
"""Collect firewall configuration"""
# Implementation depends on firewall vendor
pass
def collect(self) -> Dict:
"""Main collection method"""
return {
'switches': self.collect_switch_inventory(),
'routers': self.collect_router_data(),
'firewalls': self.collect_firewall_rules(),
'vlans': self.collect_vlan_config(),
'timestamp': datetime.now().isoformat()
}
```
### 2.3 VMware Collector
```python
#!/usr/bin/env python3
"""
collectors/virtualization.py - Raccolta dati VMware/Hypervisor
"""
from pyVim.connect import SmartConnect, Disconnect
from pyVmomi import vim
import ssl
class VirtualizationCollector:
def __init__(self, config):
self.config = config
self.si = self.connect_vcenter()
def connect_vcenter(self):
"""Connect to vCenter"""
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
context.verify_mode = ssl.CERT_NONE
return SmartConnect(
host=self.config['vmware']['vcenter_host'],
user=self.config['vmware']['username'],
pwd=self.config['vmware']['password'],
sslContext=context
)
def collect_vm_inventory(self) -> List[Dict]:
"""Collect all VMs"""
content = self.si.RetrieveContent()
container = content.rootFolder
viewType = [vim.VirtualMachine]
recursive = True
containerView = content.viewManager.CreateContainerView(
container, viewType, recursive
)
vms = []
for vm in containerView.view:
if vm.config:
vms.append({
'name': vm.name,
'power_state': vm.runtime.powerState,
'vcpu': vm.config.hardware.numCPU,
'memory_mb': vm.config.hardware.memoryMB,
'guest_os': vm.config.guestFullName,
'host': vm.runtime.host.name if vm.runtime.host else 'N/A',
'storage_gb': sum(d.capacityInBytes for d in vm.config.hardware.device
if isinstance(d, vim.vm.device.VirtualDisk)) / 1024**3
})
return vms
def collect_host_inventory(self) -> List[Dict]:
"""Collect ESXi hosts"""
content = self.si.RetrieveContent()
hosts = []
for datacenter in content.rootFolder.childEntity:
if hasattr(datacenter, 'hostFolder'):
for cluster in datacenter.hostFolder.childEntity:
for host in cluster.host:
hosts.append({
'name': host.name,
'cluster': cluster.name,
'cpu_cores': host.hardware.cpuInfo.numCpuCores,
'memory_gb': host.hardware.memorySize / 1024**3,
'cpu_usage': host.summary.quickStats.overallCpuUsage,
'memory_usage': host.summary.quickStats.overallMemoryUsage,
'vms_count': len(host.vm),
'uptime': host.summary.quickStats.uptime,
})
return hosts
def collect(self) -> Dict:
"""Main collection method"""
data = {
'vms': self.collect_vm_inventory(),
'hosts': self.collect_host_inventory(),
'datastores': self.collect_datastore_info(),
'clusters': self.collect_cluster_config(),
'timestamp': datetime.now().isoformat()
}
Disconnect(self.si)
return data
```
---
## 3. Helper Functions
### 3.1 SNMP Utilities
```python
"""
utils/snmp_helper.py
"""
from pysnmp.hlapi import *
def snmp_get(target, oid, community='public'):
"""Simple SNMP GET"""
iterator = getCmd(
SnmpEngine(),
CommunityData(community),
UdpTransportTarget((target, 161)),
ContextData(),
ObjectType(ObjectIdentity(oid))
)
errorIndication, errorStatus, errorIndex, varBinds = next(iterator)
if errorIndication:
raise Exception(f"SNMP Error: {errorIndication}")
return str(varBinds[0][1])
def snmp_walk(target, oid, community='public'):
"""Simple SNMP WALK"""
results = []
for (errorIndication, errorStatus, errorIndex, varBinds) in nextCmd(
SnmpEngine(),
CommunityData(community),
UdpTransportTarget((target, 161)),
ContextData(),
ObjectType(ObjectIdentity(oid)),
lexicographicMode=False
):
if errorIndication:
break
for varBind in varBinds:
results.append((str(varBind[0]), str(varBind[1])))
return results
```
### 3.2 Token Counter
```python
"""
utils/token_counter.py
"""
def count_tokens(text):
"""
Stima approssimativa dei token
1 token ≈ 4 caratteri in inglese
"""
return len(text) // 4
def count_file_tokens(file_path):
"""Count tokens in a file"""
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
return count_tokens(content)
```
---
## 4. Configuration File Example
### 4.1 config.yaml
```yaml
# Configuration file for datacenter documentation generator
# Database connections
databases:
asset_db:
host: db.company.local
port: 3306
user: readonly_user
password: ${VAULT:asset_db_password}
database: asset_management
# Infrastructure
infrastructure:
ups_devices:
- id: UPS-01
ip: 10.0.10.10
power_kva: 100
- id: UPS-02
ip: 10.0.10.11
power_kva: 100
sensors_api: http://sensors.company.local
# Network devices
network:
switches:
- hostname: core-sw-01
host: 10.0.10.20
type: cisco_ios
username: readonly
password: ${VAULT:network_password}
# VMware
vmware:
vcenter_host: vcenter.company.local
username: automation@vsphere.local
password: ${VAULT:vmware_password}
# SNMP
snmp:
community: ${VAULT:snmp_community}
version: 2c
# Output
output:
directory: /opt/datacenter-docs/output
format: markdown
# Thresholds
thresholds:
cpu_warning: 80
cpu_critical: 90
memory_warning: 85
memory_critical: 95
```
---
## 5. Deployment Script
### 5.1 deploy.sh
```bash
#!/bin/bash
# Deploy datacenter documentation generator
set -e
INSTALL_DIR="/opt/datacenter-docs"
VENV_DIR="$INSTALL_DIR/venv"
LOG_DIR="/var/log/datacenter-docs"
echo "Installing datacenter documentation generator..."
# Create directories
mkdir -p $INSTALL_DIR
mkdir -p $LOG_DIR
mkdir -p $INSTALL_DIR/output
# Create virtual environment
python3 -m venv $VENV_DIR
source $VENV_DIR/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Copy files
cp -r collectors $INSTALL_DIR/
cp -r generators $INSTALL_DIR/
cp -r validators $INSTALL_DIR/
cp -r templates $INSTALL_DIR/
cp main.py $INSTALL_DIR/
cp config.yaml $INSTALL_DIR/
# Set permissions
chown -R automation:automation $INSTALL_DIR
chmod +x $INSTALL_DIR/main.py
# Install cron job
cat > /etc/cron.d/datacenter-docs << 'CRON'
# Datacenter documentation generation
0 */6 * * * automation /opt/datacenter-docs/venv/bin/python /opt/datacenter-docs/main.py >> /var/log/datacenter-docs/cron.log 2>&1
CRON
echo "✓ Installation complete!"
echo "Run: cd $INSTALL_DIR && source venv/bin/activate && python main.py --help"
```
---
## 6. Testing Framework
### 6.1 test_collectors.py
```python
#!/usr/bin/env python3
"""
tests/test_collectors.py
"""
import unittest
from unittest.mock import Mock, patch
from collectors.infrastructure import InfrastructureCollector
class TestInfrastructureCollector(unittest.TestCase):
def setUp(self):
self.config = {
'databases': {'asset_db': {...}},
'snmp': {'community': 'public'}
}
self.collector = InfrastructureCollector(self.config)
@patch('mysql.connector.connect')
def test_asset_db_connection(self, mock_connect):
"""Test database connection"""
mock_connect.return_value = Mock()
db = self.collector.connect_asset_db()
self.assertIsNotNone(db)
def test_ups_data_collection(self):
"""Test UPS data collection"""
# Mock SNMP responses
ups_data = self.collector.collect_ups_data()
self.assertIsInstance(ups_data, list)
if __name__ == '__main__':
unittest.main()
```
---
**Documento Versione**: 1.0
**Per Supporto**: automation-team@company.com

View File

@@ -0,0 +1,531 @@
# Requisiti Tecnici per LLM - Generazione Documentazione Datacenter
## 1. Capacità Richieste al LLM
### 1.1 Capabilities Fondamentali
- **Network Access**: Connessioni SSH, HTTPS, SNMP
- **API Interaction**: REST, SOAP, GraphQL
- **Code Execution**: Python, Bash, PowerShell
- **File Operations**: Lettura/scrittura file markdown
- **Database Access**: MySQL, PostgreSQL, SQL Server
### 1.2 Librerie Python Richieste
```python
# Networking e protocolli
pip install paramiko # SSH connections
pip install pysnmp # SNMP queries
pip install requests # HTTP/REST APIs
pip install netmiko # Network device automation
# Virtualizzazione
pip install pyvmomi # VMware vSphere API
pip install proxmoxer # Proxmox API
pip install libvirt-python # KVM/QEMU
# Storage
pip install pure-storage # Pure Storage API
pip install netapp-ontap # NetApp API
# Database
pip install mysql-connector-python
pip install psycopg2 # PostgreSQL
pip install pymssql # Microsoft SQL Server
# Monitoring
pip install zabbix-api # Zabbix
pip install prometheus-client # Prometheus
# Cloud providers
pip install boto3 # AWS
pip install azure-mgmt # Azure
pip install google-cloud # GCP
# Utilities
pip install jinja2 # Template rendering
pip install pyyaml # YAML parsing
pip install pandas # Data analysis
pip install markdown # Markdown generation
```
### 1.3 CLI Tools Required
```bash
# Network tools
apt-get install snmp snmp-mibs-downloader
apt-get install nmap
apt-get install netcat-openbsd
# Virtualization
apt-get install open-vm-tools # VMware
# Monitoring
apt-get install nagios-plugins
# Storage
apt-get install nfs-common
apt-get install cifs-utils
apt-get install multipath-tools
# Database clients
apt-get install mysql-client
apt-get install postgresql-client
```
---
## 2. Accessi e Credenziali Necessarie
### 2.1 Formato Credenziali
Le credenziali devono essere fornite in un file sicuro (vault/encrypted):
```yaml
# credentials.yaml (encrypted)
datacenter:
# Network devices
network:
cisco_switches:
username: admin
password: ${ENCRYPTED}
enable_password: ${ENCRYPTED}
firewalls:
api_key: ${ENCRYPTED}
# Virtualization
vmware:
vcenter_host: vcenter.domain.local
username: automation@vsphere.local
password: ${ENCRYPTED}
proxmox:
host: proxmox.domain.local
token_name: automation
token_value: ${ENCRYPTED}
# Storage
storage_arrays:
- name: SAN-01
type: pure_storage
api_token: ${ENCRYPTED}
# Databases
databases:
asset_management:
host: db.domain.local
port: 3306
username: readonly_user
password: ${ENCRYPTED}
database: asset_db
# Monitoring
monitoring:
zabbix:
url: https://zabbix.domain.local
api_token: ${ENCRYPTED}
# Backup
backup:
veeam:
server: veeam.domain.local
username: automation
password: ${ENCRYPTED}
```
### 2.2 Permessi Minimi Richiesti
**IMPORTANTE**: Utilizzare SEMPRE account a permessi minimi (read-only dove possibile)
| Sistema | Account Type | Permessi Richiesti |
|---------|-------------|-------------------|
| Network Devices | Read-only | show commands, SNMP read |
| VMware vCenter | Read-only | Global > Read-only role |
| Storage Arrays | Read-only | Monitoring/reporting access |
| Databases | SELECT only | Read access su schema asset |
| Monitoring | Read-only | View dashboards, metrics |
| Backup Software | Read-only | View jobs, reports |
---
## 3. Connettività di Rete
### 3.1 Requisiti Rete
```
LLM Host deve poter raggiungere:
Management Network:
- VLAN 10: 10.0.10.0/24 (Infrastructure Management)
- VLAN 20: 10.0.20.0/24 (Server Management)
- VLAN 30: 10.0.30.0/24 (Storage Management)
Porte richieste:
- TCP 22 (SSH)
- TCP 443 (HTTPS)
- TCP 3306 (MySQL)
- TCP 5432 (PostgreSQL)
- TCP 1433 (MS SQL Server)
- UDP 161 (SNMP)
- TCP 8006 (Proxmox)
```
### 3.2 Firewall Rules
```
# Allow LLM host to management networks
Source: [LLM_HOST_IP]
Destination: Management Networks
Protocol: SSH, HTTPS, SNMP, Database ports
Action: ALLOW
# Deny all other traffic from LLM host
Source: [LLM_HOST_IP]
Destination: Production Networks
Action: DENY
```
---
## 4. Rate Limiting e Best Practices
### 4.1 API Call Limits
```python
# Rispettare rate limits dei vendor
RATE_LIMITS = {
'vmware_vcenter': {'calls_per_minute': 100},
'network_devices': {'calls_per_minute': 10},
'storage_api': {'calls_per_minute': 60},
'monitoring_api': {'calls_per_minute': 300}
}
# Implementare retry logic con exponential backoff
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
time.sleep(delay)
return wrapper
return decorator
```
### 4.2 Concurrent Operations
```python
# Limitare operazioni concorrenti
from concurrent.futures import ThreadPoolExecutor
MAX_WORKERS = 5 # Non saturare le risorse
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
futures = [executor.submit(query_device, device) for device in devices]
results = [f.result() for f in futures]
```
---
## 5. Error Handling e Logging
### 5.1 Logging Configuration
```python
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/var/log/datacenter-docs/generation.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger('datacenter-docs')
```
### 5.2 Error Handling Strategy
```python
class DataCollectionError(Exception):
"""Custom exception per errori di raccolta dati"""
pass
try:
data = collect_vmware_data()
except ConnectionError as e:
logger.error(f"Cannot connect to vCenter: {e}")
# Utilizzare dati cached se disponibili
data = load_cached_data('vmware')
except AuthenticationError as e:
logger.critical(f"Authentication failed: {e}")
# Inviare alert al team
send_alert("VMware auth failed")
except Exception as e:
logger.exception(f"Unexpected error: {e}")
# Continuare con dati parziali
data = get_partial_data()
```
---
## 6. Caching e Performance
### 6.1 Cache Strategy
```python
import redis
from datetime import timedelta
# Setup Redis per caching
cache = redis.Redis(host='localhost', port=6379, db=0)
def get_cached_or_fetch(key, fetch_function, ttl=3600):
"""Get from cache or fetch if not available"""
cached = cache.get(key)
if cached:
logger.info(f"Cache hit for {key}")
return json.loads(cached)
logger.info(f"Cache miss for {key}, fetching...")
data = fetch_function()
cache.setex(key, ttl, json.dumps(data))
return data
# Esempio uso
vmware_inventory = get_cached_or_fetch(
'vmware_inventory',
lambda: collect_vmware_inventory(),
ttl=3600 # 1 hour
)
```
### 6.2 Dati da Cachare
- **1 ora**: Performance metrics, status real-time
- **6 ore**: Inventory, configurazioni
- **24 ore**: Asset database, ownership info
- **7 giorni**: Historical trends, capacity planning
---
## 7. Schedule di Esecuzione
### 7.1 Cron Schedule Raccomandato
```cron
# Aggiornamento documentazione completa - ogni 6 ore
0 */6 * * * /usr/local/bin/generate-datacenter-docs.sh --full
# Quick update (solo metrics) - ogni ora
0 * * * * /usr/local/bin/generate-datacenter-docs.sh --metrics-only
# Weekly comprehensive report - domenica notte
0 2 * * 0 /usr/local/bin/generate-datacenter-docs.sh --full --detailed
```
### 7.2 Script Wrapper Esempio
```bash
#!/bin/bash
# generate-datacenter-docs.sh
set -e
LOGFILE="/var/log/datacenter-docs/$(date +%Y%m%d_%H%M%S).log"
LOCKFILE="/var/run/datacenter-docs.lock"
# Prevent concurrent executions
if [ -f "$LOCKFILE" ]; then
echo "Another instance is running. Exiting."
exit 1
fi
touch "$LOCKFILE"
trap "rm -f $LOCKFILE" EXIT
# Activate virtual environment
source /opt/datacenter-docs/venv/bin/activate
# Run Python script with parameters
python3 /opt/datacenter-docs/main.py "$@" 2>&1 | tee -a "$LOGFILE"
# Cleanup old logs (keep 30 days)
find /var/log/datacenter-docs/ -name "*.log" -mtime +30 -delete
```
---
## 8. Output e Validazione
### 8.1 Post-Generation Checks
```python
def validate_documentation(section_file):
"""Valida il documento generato"""
checks = {
'file_exists': os.path.exists(section_file),
'not_empty': os.path.getsize(section_file) > 0,
'valid_markdown': validate_markdown_syntax(section_file),
'no_placeholders': not contains_placeholders(section_file),
'token_limit': count_tokens(section_file) < 50000
}
if all(checks.values()):
logger.info(f"✓ {section_file} validation passed")
return True
else:
failed = [k for k, v in checks.items() if not v]
logger.error(f"✗ {section_file} validation failed: {failed}")
return False
def contains_placeholders(file_path):
"""Check per placeholders non sostituiti"""
with open(file_path, 'r') as f:
content = f.read()
patterns = [r'\[.*?\]', r'\{.*?\}', r'TODO', r'FIXME']
import re
return any(re.search(p, content) for p in patterns)
```
### 8.2 Notification System
```python
def send_completion_notification(success, sections_updated, errors):
"""Invia notifica a fine generazione"""
message = f"""
Datacenter Documentation Update
Status: {'✓ SUCCESS' if success else '✗ FAILED'}
Sections Updated: {', '.join(sections_updated)}
Errors: {len(errors)}
{'Errors:\n' + '\n'.join(errors) if errors else ''}
Timestamp: {datetime.now().isoformat()}
"""
# Send via multiple channels
send_email(recipients=['ops-team@company.com'], subject='Doc Update', body=message)
send_slack(channel='#datacenter-ops', message=message)
# send_teams / send_webhook as needed
```
---
## 9. Security Considerations
### 9.1 Secrets Management
```python
# NON salvare mai credenziali in chiaro
# Utilizzare sempre un vault
from cryptography.fernet import Fernet
import keyring
def get_credential(service, account):
"""Retrieve credential from OS keyring"""
return keyring.get_password(service, account)
# Oppure HashiCorp Vault
import hvac
client = hvac.Client(url='https://vault.company.com')
client.auth.approle.login(role_id=ROLE_ID, secret_id=SECRET_ID)
credentials = client.secrets.kv.v2.read_secret_version(path='datacenter/creds')
```
### 9.2 Audit Trail
```python
# Log TUTTE le operazioni per audit
audit_log = {
'timestamp': datetime.now().isoformat(),
'user': 'automation-account',
'action': 'documentation_generation',
'sections': sections_updated,
'systems_accessed': list_of_systems,
'duration': elapsed_time,
'success': True/False
}
write_audit_log(audit_log)
```
---
## 10. Troubleshooting
### 10.1 Common Issues
| Problema | Causa Probabile | Soluzione |
|----------|----------------|-----------|
| Connection Timeout | Firewall/Network | Verificare connectivity, firewall rules |
| Authentication Failed | Credenziali errate/scadute | Ruotare credenziali, verificare vault |
| API Rate Limit | Troppe richieste | Implementare backoff, ridurre frequency |
| Incomplete Data | Source temporaneamente down | Usare cached data, generare partial doc |
| Token Limit Exceeded | Troppi dati in sezione | Rimuovere dati storici, ottimizzare formato |
### 10.2 Debug Mode
```python
# Abilitare debug per troubleshooting
DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
if DEBUG:
logging.getLogger().setLevel(logging.DEBUG)
# Salvare raw responses per analisi
with open(f'debug_{timestamp}.json', 'w') as f:
json.dump(raw_response, f, indent=2)
```
---
## 11. Testing
### 11.1 Unit Tests
```python
import unittest
class TestDataCollection(unittest.TestCase):
def test_vmware_connection(self):
"""Test connessione a vCenter"""
result = test_vmware_connection()
self.assertTrue(result.success)
def test_data_validation(self):
"""Test validazione dati raccolti"""
sample_data = load_sample_data()
self.assertTrue(validate_data_structure(sample_data))
```
### 11.2 Integration Tests
```bash
# Test end-to-end in ambiente di test
./run-tests.sh --integration --environment=test
# Verificare che tutti i sistemi siano raggiungibili
./check-connectivity.sh
# Dry-run senza salvare
python3 main.py --dry-run --verbose
```
---
## Checklist Pre-Deployment
Prima di mettere in produzione il sistema:
- [ ] Tutte le librerie installate
- [ ] Credenziali configurate in vault sicuro
- [ ] Connectivity verificata verso tutti i sistemi
- [ ] Permessi account automation validati (read-only)
- [ ] Firewall rules approvate e configurate
- [ ] Logging configurato e testato
- [ ] Notification system testato
- [ ] Cron jobs configurati
- [ ] Backup documentazione esistente
- [ ] Runbook operativo completato
- [ ] Escalation path definito
- [ ] DR procedure documentate
---
**Documento Versione**: 1.0
**Ultimo Aggiornamento**: 2025-01-XX
**Owner**: Automation Team