Initial commit: LLM Automation Docs & Remediation Engine v2.0
Features: - Automated datacenter documentation generation - MCP integration for device connectivity - Auto-remediation engine with safety checks - Multi-factor reliability scoring (0-100%) - Human feedback learning loop - Pattern recognition and continuous improvement - Agentic chat support with AI - API for ticket resolution - Frontend React with Material-UI - CI/CD pipelines (GitLab + Gitea) - Docker & Kubernetes deployment - Complete documentation and guides v2.0 Highlights: - Auto-remediation with write operations (disabled by default) - Reliability calculator with 4-factor scoring - Human feedback system for continuous learning - Pattern-based progressive automation - Approval workflow for critical actions - Full audit trail and rollback capability
This commit is contained in:
687
requirements/api_endpoints.md
Normal file
687
requirements/api_endpoints.md
Normal file
@@ -0,0 +1,687 @@
|
||||
# API Endpoints e Comandi per Raccolta Dati
|
||||
|
||||
## 1. VMware vSphere API
|
||||
|
||||
### 1.1 REST API Endpoints
|
||||
```bash
|
||||
# Base URL
|
||||
BASE_URL="https://vcenter.domain.local/rest"
|
||||
|
||||
# Authentication
|
||||
curl -X POST $BASE_URL/com/vmware/cis/session \
|
||||
-u 'automation@vsphere.local:password'
|
||||
|
||||
# Get all VMs
|
||||
curl -X GET $BASE_URL/vcenter/vm \
|
||||
-H "vmware-api-session-id: ${SESSION_ID}"
|
||||
|
||||
# Get VM details
|
||||
curl -X GET $BASE_URL/vcenter/vm/${VM_ID} \
|
||||
-H "vmware-api-session-id: ${SESSION_ID}"
|
||||
|
||||
# Get hosts
|
||||
curl -X GET $BASE_URL/vcenter/host \
|
||||
-H "vmware-api-session-id: ${SESSION_ID}"
|
||||
|
||||
# Get datastores
|
||||
curl -X GET $BASE_URL/vcenter/datastore \
|
||||
-H "vmware-api-session-id: ${SESSION_ID}"
|
||||
|
||||
# Get clusters
|
||||
curl -X GET $BASE_URL/vcenter/cluster \
|
||||
-H "vmware-api-session-id: ${SESSION_ID}"
|
||||
```
|
||||
|
||||
### 1.2 PowerCLI Commands
|
||||
```powershell
|
||||
# Connect
|
||||
Connect-VIServer -Server vcenter.domain.local -User automation@vsphere.local
|
||||
|
||||
# Get all VMs with details
|
||||
Get-VM | Select-Object Name, PowerState, NumCpu, MemoryGB, @{N='UsedSpaceGB';E={[math]::Round($_.UsedSpaceGB,2)}}, VMHost, ResourcePool | Export-Csv -Path vms.csv
|
||||
|
||||
# Get hosts
|
||||
Get-VMHost | Select-Object Name, ConnectionState, PowerState, Version, NumCpu, MemoryTotalGB, @{N='MemoryUsageGB';E={[math]::Round($_.MemoryUsageGB,2)}} | Export-Csv -Path hosts.csv
|
||||
|
||||
# Get datastores
|
||||
Get-Datastore | Select-Object Name, Type, CapacityGB, FreeSpaceGB, @{N='PercentFree';E={[math]::Round(($_.FreeSpaceGB/$_.CapacityGB*100),2)}} | Export-Csv -Path datastores.csv
|
||||
|
||||
# Get performance stats
|
||||
Get-Stat -Entity (Get-VM) -Stat cpu.usage.average,mem.usage.average -Start (Get-Date).AddDays(-7) -IntervalMins 5 | Export-Csv -Path performance.csv
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Proxmox VE API
|
||||
|
||||
### 2.1 REST API
|
||||
```bash
|
||||
# Base URL
|
||||
PROXMOX_URL="https://proxmox.domain.local:8006/api2/json"
|
||||
|
||||
# Get ticket (authentication)
|
||||
curl -k -d "username=automation@pam&password=password" \
|
||||
$PROXMOX_URL/access/ticket
|
||||
|
||||
# Get nodes
|
||||
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
|
||||
$PROXMOX_URL/nodes
|
||||
|
||||
# Get VMs on node
|
||||
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
|
||||
$PROXMOX_URL/nodes/${NODE}/qemu
|
||||
|
||||
# Get containers
|
||||
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
|
||||
$PROXMOX_URL/nodes/${NODE}/lxc
|
||||
|
||||
# Get storage
|
||||
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
|
||||
$PROXMOX_URL/nodes/${NODE}/storage
|
||||
|
||||
# Get cluster status
|
||||
curl -k -H "Cookie: PVEAuthCookie=${TICKET}" \
|
||||
$PROXMOX_URL/cluster/status
|
||||
```
|
||||
|
||||
### 2.2 CLI Commands
|
||||
```bash
|
||||
# List VMs
|
||||
pvesh get /cluster/resources --type vm
|
||||
|
||||
# VM status
|
||||
qm status ${VMID}
|
||||
|
||||
# Container list
|
||||
pct list
|
||||
|
||||
# Storage info
|
||||
pvesm status
|
||||
|
||||
# Node info
|
||||
pvesh get /nodes/${NODE}/status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Network Devices
|
||||
|
||||
### 3.1 Cisco IOS Commands
|
||||
```bash
|
||||
# Via SSH
|
||||
ssh admin@switch.domain.local
|
||||
|
||||
# System information
|
||||
show version
|
||||
show inventory
|
||||
show running-config
|
||||
|
||||
# Interfaces
|
||||
show interfaces status
|
||||
show interfaces description
|
||||
show interfaces counters errors
|
||||
show ip interface brief
|
||||
|
||||
# VLANs
|
||||
show vlan brief
|
||||
show vlan id ${VLAN_ID}
|
||||
|
||||
# Spanning Tree
|
||||
show spanning-tree summary
|
||||
show spanning-tree root
|
||||
|
||||
# Routing
|
||||
show ip route
|
||||
show ip protocols
|
||||
|
||||
# CDP/LLDP
|
||||
show cdp neighbors detail
|
||||
show lldp neighbors
|
||||
|
||||
# Performance
|
||||
show processes cpu history
|
||||
show memory statistics
|
||||
show environment all
|
||||
```
|
||||
|
||||
### 3.2 HP/Aruba Switch Commands
|
||||
```bash
|
||||
# System info
|
||||
show system
|
||||
show version
|
||||
show running-config
|
||||
|
||||
# Interfaces
|
||||
show interfaces brief
|
||||
show interfaces status
|
||||
|
||||
# VLANs
|
||||
show vlans
|
||||
|
||||
# Spanning tree
|
||||
show spanning-tree
|
||||
|
||||
# Logging
|
||||
show log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Firewall APIs
|
||||
|
||||
### 4.1 pfSense/OPNsense API
|
||||
```bash
|
||||
# Base URL
|
||||
FW_URL="https://firewall.domain.local/api"
|
||||
|
||||
# Get system info
|
||||
curl -X GET "${FW_URL}/core/system/status" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
|
||||
# Get interfaces
|
||||
curl -X GET "${FW_URL}/interfaces/overview/export" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
|
||||
# Get firewall rules
|
||||
curl -X GET "${FW_URL}/firewall/filter/searchRule" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
|
||||
# Get VPN status
|
||||
curl -X GET "${FW_URL}/ipsec/sessions" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
```
|
||||
|
||||
### 4.2 Fortinet FortiGate API
|
||||
```bash
|
||||
# Base URL
|
||||
FORTI_URL="https://fortigate.domain.local/api/v2"
|
||||
|
||||
# System status
|
||||
curl -X GET "${FORTI_URL}/monitor/system/status" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
|
||||
# Interface stats
|
||||
curl -X GET "${FORTI_URL}/monitor/system/interface/select" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
|
||||
# Firewall policies
|
||||
curl -X GET "${FORTI_URL}/cmdb/firewall/policy" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
|
||||
# VPN status
|
||||
curl -X GET "${FORTI_URL}/monitor/vpn/ipsec" \
|
||||
-H "Authorization: Bearer ${API_TOKEN}"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Storage Arrays
|
||||
|
||||
### 5.1 Pure Storage API
|
||||
```bash
|
||||
# Base URL
|
||||
PURE_URL="https://array.domain.local/api"
|
||||
|
||||
# Get array info
|
||||
curl -X GET "${PURE_URL}/1.19/array" \
|
||||
-H "api-token: ${API_TOKEN}"
|
||||
|
||||
# Get volumes
|
||||
curl -X GET "${PURE_URL}/1.19/volume" \
|
||||
-H "api-token: ${API_TOKEN}"
|
||||
|
||||
# Get hosts
|
||||
curl -X GET "${PURE_URL}/1.19/host" \
|
||||
-H "api-token: ${API_TOKEN}"
|
||||
|
||||
# Get performance metrics
|
||||
curl -X GET "${PURE_URL}/1.19/array/monitor?action=monitor" \
|
||||
-H "api-token: ${API_TOKEN}"
|
||||
```
|
||||
|
||||
### 5.2 NetApp ONTAP API
|
||||
```bash
|
||||
# Base URL
|
||||
NETAPP_URL="https://netapp.domain.local/api"
|
||||
|
||||
# Get cluster info
|
||||
curl -X GET "${NETAPP_URL}/cluster" \
|
||||
-u "admin:password"
|
||||
|
||||
# Get volumes
|
||||
curl -X GET "${NETAPP_URL}/storage/volumes" \
|
||||
-u "admin:password"
|
||||
|
||||
# Get aggregates
|
||||
curl -X GET "${NETAPP_URL}/storage/aggregates" \
|
||||
-u "admin:password"
|
||||
|
||||
# Get performance
|
||||
curl -X GET "${NETAPP_URL}/cluster/counter/tables/volume" \
|
||||
-u "admin:password"
|
||||
```
|
||||
|
||||
### 5.3 Generic SAN Commands
|
||||
```bash
|
||||
# Via SSH to array management interface
|
||||
|
||||
# Show system info
|
||||
show system
|
||||
show controller
|
||||
show disk
|
||||
|
||||
# Show volumes/LUNs
|
||||
show volumes
|
||||
show luns
|
||||
show mappings
|
||||
|
||||
# Show performance
|
||||
show statistics
|
||||
show disk-statistics
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Monitoring Systems
|
||||
|
||||
### 6.1 Zabbix API
|
||||
```bash
|
||||
# Base URL
|
||||
ZABBIX_URL="https://zabbix.domain.local/api_jsonrpc.php"
|
||||
|
||||
# Authenticate
|
||||
curl -X POST $ZABBIX_URL \
|
||||
-H "Content-Type: application/json-rpc" \
|
||||
-d '{
|
||||
"jsonrpc": "2.0",
|
||||
"method": "user.login",
|
||||
"params": {
|
||||
"user": "automation",
|
||||
"password": "password"
|
||||
},
|
||||
"id": 1
|
||||
}'
|
||||
|
||||
# Get hosts
|
||||
curl -X POST $ZABBIX_URL \
|
||||
-H "Content-Type: application/json-rpc" \
|
||||
-d '{
|
||||
"jsonrpc": "2.0",
|
||||
"method": "host.get",
|
||||
"params": {
|
||||
"output": ["hostid", "host", "status"]
|
||||
},
|
||||
"auth": "'${AUTH_TOKEN}'",
|
||||
"id": 1
|
||||
}'
|
||||
|
||||
# Get problems
|
||||
curl -X POST $ZABBIX_URL \
|
||||
-H "Content-Type: application/json-rpc" \
|
||||
-d '{
|
||||
"jsonrpc": "2.0",
|
||||
"method": "problem.get",
|
||||
"params": {
|
||||
"recent": true
|
||||
},
|
||||
"auth": "'${AUTH_TOKEN}'",
|
||||
"id": 1
|
||||
}'
|
||||
```
|
||||
|
||||
### 6.2 Prometheus API
|
||||
```bash
|
||||
# Base URL
|
||||
PROM_URL="http://prometheus.domain.local:9090"
|
||||
|
||||
# Query instant
|
||||
curl -X GET "${PROM_URL}/api/v1/query?query=up"
|
||||
|
||||
# Query range
|
||||
curl -X GET "${PROM_URL}/api/v1/query_range?query=node_cpu_seconds_total&start=2024-01-01T00:00:00Z&end=2024-01-02T00:00:00Z&step=15s"
|
||||
|
||||
# Get targets
|
||||
curl -X GET "${PROM_URL}/api/v1/targets"
|
||||
|
||||
# Get alerts
|
||||
curl -X GET "${PROM_URL}/api/v1/alerts"
|
||||
```
|
||||
|
||||
### 6.3 Nagios/Icinga API
|
||||
```bash
|
||||
# Icinga2 API
|
||||
ICINGA_URL="https://icinga.domain.local:5665"
|
||||
|
||||
# Get hosts
|
||||
curl -k -u "automation:password" \
|
||||
"${ICINGA_URL}/v1/objects/hosts"
|
||||
|
||||
# Get services
|
||||
curl -k -u "automation:password" \
|
||||
"${ICINGA_URL}/v1/objects/services"
|
||||
|
||||
# Get problems
|
||||
curl -k -u "automation:password" \
|
||||
"${ICINGA_URL}/v1/objects/services?filter=service.state!=0"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Backup Systems
|
||||
|
||||
### 7.1 Veeam API
|
||||
```powershell
|
||||
# Connect to Veeam server
|
||||
Connect-VBRServer -Server veeam.domain.local -User automation
|
||||
|
||||
# Get backup jobs
|
||||
Get-VBRJob | Select-Object Name, JobType, IsScheduleEnabled, LastResult
|
||||
|
||||
# Get backup sessions
|
||||
Get-VBRBackupSession | Where-Object {$_.CreationTime -gt (Get-Date).AddDays(-7)} | Select-Object Name, JobName, Result, CreationTime
|
||||
|
||||
# Get restore points
|
||||
Get-VBRRestorePoint | Select-Object VMName, CreationTime, Type
|
||||
|
||||
# Get repositories
|
||||
Get-VBRBackupRepository | Select-Object Name, Path, @{N='FreeGB';E={[math]::Round($_.GetContainer().CachedFreeSpace.InGigabytes,2)}}
|
||||
```
|
||||
|
||||
### 7.2 CommVault API
|
||||
```bash
|
||||
# Base URL
|
||||
CV_URL="https://commvault.domain.local/webconsole/api"
|
||||
|
||||
# Login
|
||||
curl -X POST "${CV_URL}/Login" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"automation","password":"password"}'
|
||||
|
||||
# Get jobs
|
||||
curl -X GET "${CV_URL}/Job?clientName=${CLIENT}" \
|
||||
-H "Authtoken: ${TOKEN}"
|
||||
|
||||
# Get clients
|
||||
curl -X GET "${CV_URL}/Client" \
|
||||
-H "Authtoken: ${TOKEN}"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Database Queries
|
||||
|
||||
### 8.1 Asset Management DB
|
||||
```sql
|
||||
-- MySQL/MariaDB queries for asset database
|
||||
|
||||
-- Get all racks
|
||||
SELECT
|
||||
rack_id,
|
||||
location,
|
||||
total_units,
|
||||
occupied_units,
|
||||
(total_units - occupied_units) AS available_units,
|
||||
max_power_kw,
|
||||
ROUND(occupied_units * 100.0 / total_units, 2) AS utilization_percent
|
||||
FROM racks
|
||||
ORDER BY location, rack_id;
|
||||
|
||||
-- Get all servers
|
||||
SELECT
|
||||
s.hostname,
|
||||
s.serial_number,
|
||||
s.model,
|
||||
s.cpu_model,
|
||||
s.cpu_cores,
|
||||
s.ram_gb,
|
||||
s.rack_id,
|
||||
s.rack_unit,
|
||||
s.status,
|
||||
s.environment
|
||||
FROM servers s
|
||||
ORDER BY s.rack_id, s.rack_unit;
|
||||
|
||||
-- Get network devices
|
||||
SELECT
|
||||
n.hostname,
|
||||
n.device_type,
|
||||
n.vendor,
|
||||
n.model,
|
||||
n.management_ip,
|
||||
n.firmware_version,
|
||||
n.rack_id,
|
||||
n.status
|
||||
FROM network_devices n
|
||||
ORDER BY n.device_type, n.hostname;
|
||||
|
||||
-- Get contracts
|
||||
SELECT
|
||||
c.vendor,
|
||||
c.service_type,
|
||||
c.contract_type,
|
||||
c.start_date,
|
||||
c.end_date,
|
||||
DATEDIFF(c.end_date, NOW()) AS days_to_expiry,
|
||||
c.annual_cost
|
||||
FROM contracts c
|
||||
WHERE c.end_date > NOW()
|
||||
ORDER BY c.end_date;
|
||||
```
|
||||
|
||||
### 8.2 Database Server Queries
|
||||
```sql
|
||||
-- MySQL - Database sizes
|
||||
SELECT
|
||||
table_schema AS 'Database',
|
||||
ROUND(SUM(data_length + index_length) / 1024 / 1024 / 1024, 2) AS 'Size_GB'
|
||||
FROM information_schema.tables
|
||||
GROUP BY table_schema
|
||||
ORDER BY SUM(data_length + index_length) DESC;
|
||||
|
||||
-- PostgreSQL - Database sizes
|
||||
SELECT
|
||||
datname AS database_name,
|
||||
pg_size_pretty(pg_database_size(datname)) AS size
|
||||
FROM pg_database
|
||||
ORDER BY pg_database_size(datname) DESC;
|
||||
|
||||
-- SQL Server - Database sizes
|
||||
SELECT
|
||||
DB_NAME(database_id) AS DatabaseName,
|
||||
(size * 8.0 / 1024) AS SizeMB
|
||||
FROM sys.master_files
|
||||
WHERE type = 0
|
||||
ORDER BY size DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Cloud Provider APIs
|
||||
|
||||
### 9.1 AWS (Boto3)
|
||||
```python
|
||||
import boto3
|
||||
|
||||
# EC2 instances
|
||||
ec2 = boto3.client('ec2')
|
||||
instances = ec2.describe_instances()
|
||||
|
||||
# S3 buckets
|
||||
s3 = boto3.client('s3')
|
||||
buckets = s3.list_buckets()
|
||||
|
||||
# RDS databases
|
||||
rds = boto3.client('rds')
|
||||
databases = rds.describe_db_instances()
|
||||
|
||||
# Cost Explorer
|
||||
ce = boto3.client('ce')
|
||||
cost = ce.get_cost_and_usage(
|
||||
TimePeriod={'Start': '2024-01-01', 'End': '2024-01-31'},
|
||||
Granularity='MONTHLY',
|
||||
Metrics=['UnblendedCost']
|
||||
)
|
||||
```
|
||||
|
||||
### 9.2 Azure (SDK)
|
||||
```python
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from azure.mgmt.compute import ComputeManagementClient
|
||||
from azure.mgmt.storage import StorageManagementClient
|
||||
|
||||
credential = DefaultAzureCredential()
|
||||
|
||||
# VMs
|
||||
compute_client = ComputeManagementClient(credential, subscription_id)
|
||||
vms = compute_client.virtual_machines.list_all()
|
||||
|
||||
# Storage accounts
|
||||
storage_client = StorageManagementClient(credential, subscription_id)
|
||||
storage_accounts = storage_client.storage_accounts.list()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. SNMP OIDs Reference
|
||||
|
||||
### 10.1 Common System OIDs
|
||||
```bash
|
||||
# System description
|
||||
.1.3.6.1.2.1.1.1.0 # sysDescr
|
||||
|
||||
# System uptime
|
||||
.1.3.6.1.2.1.1.3.0 # sysUpTime
|
||||
|
||||
# System name
|
||||
.1.3.6.1.2.1.1.5.0 # sysName
|
||||
|
||||
# System location
|
||||
.1.3.6.1.2.1.1.6.0 # sysLocation
|
||||
```
|
||||
|
||||
### 10.2 UPS OIDs (RFC 1628)
|
||||
```bash
|
||||
# UPS identity
|
||||
.1.3.6.1.2.1.33.1.1.1.0 # upsIdentManufacturer
|
||||
.1.3.6.1.2.1.33.1.1.2.0 # upsIdentModel
|
||||
|
||||
# Battery status
|
||||
.1.3.6.1.2.1.33.1.2.1.0 # upsBatteryStatus
|
||||
.1.3.6.1.2.1.33.1.2.2.0 # upsSecondsOnBattery
|
||||
.1.3.6.1.2.1.33.1.2.3.0 # upsEstimatedMinutesRemaining
|
||||
|
||||
# Input
|
||||
.1.3.6.1.2.1.33.1.3.3.1.3 # upsInputVoltage
|
||||
.1.3.6.1.2.1.33.1.3.3.1.4 # upsInputCurrent
|
||||
.1.3.6.1.2.1.33.1.3.3.1.6 # upsInputTruePower
|
||||
|
||||
# Output
|
||||
.1.3.6.1.2.1.33.1.4.4.1.2 # upsOutputVoltage
|
||||
.1.3.6.1.2.1.33.1.4.4.1.3 # upsOutputCurrent
|
||||
.1.3.6.1.2.1.33.1.4.4.1.4 # upsOutputPower
|
||||
.1.3.6.1.2.1.33.1.4.4.1.5 # upsOutputPercentLoad
|
||||
```
|
||||
|
||||
### 10.3 Network Interface OIDs
|
||||
```bash
|
||||
# Interface description
|
||||
.1.3.6.1.2.1.2.2.1.2 # ifDescr
|
||||
|
||||
# Interface status
|
||||
.1.3.6.1.2.1.2.2.1.8 # ifOperStatus
|
||||
|
||||
# Interface traffic
|
||||
.1.3.6.1.2.1.2.2.1.10 # ifInOctets
|
||||
.1.3.6.1.2.1.2.2.1.16 # ifOutOctets
|
||||
|
||||
# Interface errors
|
||||
.1.3.6.1.2.1.2.2.1.14 # ifInErrors
|
||||
.1.3.6.1.2.1.2.2.1.20 # ifOutErrors
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Example Collection Script
|
||||
|
||||
### 11.1 Complete Data Collection
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# collect_all_data.sh - Orchestrate all data collection
|
||||
|
||||
OUTPUT_DIR="/tmp/datacenter-collection-$(date +%Y%m%d_%H%M%S)"
|
||||
mkdir -p $OUTPUT_DIR
|
||||
|
||||
echo "Starting datacenter data collection..."
|
||||
|
||||
# VMware
|
||||
echo "Collecting VMware data..."
|
||||
python3 collect_vmware.py > $OUTPUT_DIR/vmware.json
|
||||
|
||||
# Network devices
|
||||
echo "Collecting network configurations..."
|
||||
./collect_network.sh > $OUTPUT_DIR/network.json
|
||||
|
||||
# Storage
|
||||
echo "Collecting storage data..."
|
||||
python3 collect_storage.py > $OUTPUT_DIR/storage.json
|
||||
|
||||
# Monitoring
|
||||
echo "Collecting monitoring data..."
|
||||
./collect_monitoring.sh > $OUTPUT_DIR/monitoring.json
|
||||
|
||||
# Databases
|
||||
echo "Querying databases..."
|
||||
mysql -h db.local -u reader -pPASS asset_db < queries.sql > $OUTPUT_DIR/asset_db.csv
|
||||
|
||||
# SNMP devices
|
||||
echo "Polling SNMP devices..."
|
||||
./poll_snmp.sh > $OUTPUT_DIR/snmp.json
|
||||
|
||||
echo "Collection complete. Data saved to: $OUTPUT_DIR"
|
||||
tar -czf $OUTPUT_DIR.tar.gz $OUTPUT_DIR
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Rate Limiting Reference
|
||||
|
||||
### 12.1 Vendor Rate Limits
|
||||
|
||||
| Vendor | Endpoint | Limit | Time Window |
|
||||
|--------|----------|-------|-------------|
|
||||
| VMware vCenter | REST API | 100 req | per minute |
|
||||
| Zabbix | API | 300 req | per minute |
|
||||
| Pure Storage | REST API | 60 req | per minute |
|
||||
| Cisco DNA Center | API | 10 req | per second |
|
||||
| AWS | API (varies) | 10-100 req | per second |
|
||||
|
||||
### 12.2 Retry Strategy
|
||||
```python
|
||||
import time
|
||||
from functools import wraps
|
||||
|
||||
def rate_limited_retry(max_retries=3, backoff_factor=2):
|
||||
def decorator(func):
|
||||
@wraps(func)
|
||||
def wrapper(*args, **kwargs):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return func(*args, **kwargs)
|
||||
except RateLimitException:
|
||||
if attempt == max_retries - 1:
|
||||
raise
|
||||
wait_time = backoff_factor ** attempt
|
||||
logger.warning(f"Rate limited. Waiting {wait_time}s before retry {attempt+1}/{max_retries}")
|
||||
time.sleep(wait_time)
|
||||
except Exception as e:
|
||||
logger.error(f"Error: {e}")
|
||||
raise
|
||||
return wrapper
|
||||
return decorator
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Documento Versione**: 1.0
|
||||
**Ultimo Aggiornamento**: 2025-01-XX
|
||||
**Maintainer**: Automation Team
|
||||
663
requirements/data_collection_scripts.md
Normal file
663
requirements/data_collection_scripts.md
Normal file
@@ -0,0 +1,663 @@
|
||||
# Script di Raccolta Dati per Documentazione Datacenter
|
||||
|
||||
## 1. Script Python Principali
|
||||
|
||||
### 1.1 Main Orchestrator
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
main.py - Orchestrator principale per generazione documentazione
|
||||
"""
|
||||
|
||||
import sys
|
||||
import argparse
|
||||
import logging
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
# Import moduli custom
|
||||
from collectors import (
|
||||
InfrastructureCollector,
|
||||
NetworkCollector,
|
||||
VirtualizationCollector,
|
||||
StorageCollector,
|
||||
SecurityCollector,
|
||||
BackupCollector,
|
||||
MonitoringCollector,
|
||||
DatabaseCollector,
|
||||
ProcedureCollector,
|
||||
ImprovementAnalyzer
|
||||
)
|
||||
|
||||
from generators import DocumentationGenerator
|
||||
from validators import DocumentValidator
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class DatacenterDocGenerator:
|
||||
def __init__(self, config_file='config.yaml'):
|
||||
self.config = self.load_config(config_file)
|
||||
self.sections = []
|
||||
|
||||
def load_config(self, config_file):
|
||||
"""Load configuration from YAML file"""
|
||||
import yaml
|
||||
with open(config_file, 'r') as f:
|
||||
return yaml.safe_load(f)
|
||||
|
||||
def collect_data(self, section=None):
|
||||
"""Collect data from all sources"""
|
||||
collectors = {
|
||||
'01': InfrastructureCollector(self.config),
|
||||
'02': NetworkCollector(self.config),
|
||||
'03': VirtualizationCollector(self.config),
|
||||
'04': StorageCollector(self.config),
|
||||
'05': SecurityCollector(self.config),
|
||||
'06': BackupCollector(self.config),
|
||||
'07': MonitoringCollector(self.config),
|
||||
'08': DatabaseCollector(self.config),
|
||||
'09': ProcedureCollector(self.config),
|
||||
}
|
||||
|
||||
data = {}
|
||||
sections_to_process = [section] if section else collectors.keys()
|
||||
|
||||
for section_id in sections_to_process:
|
||||
try:
|
||||
logger.info(f"Collecting data for section {section_id}")
|
||||
collector = collectors.get(section_id)
|
||||
if collector:
|
||||
data[section_id] = collector.collect()
|
||||
logger.info(f"✓ Section {section_id} data collected")
|
||||
except Exception as e:
|
||||
logger.error(f"✗ Failed to collect section {section_id}: {e}")
|
||||
data[section_id] = None
|
||||
|
||||
return data
|
||||
|
||||
def generate_documentation(self, data):
|
||||
"""Generate markdown documentation from collected data"""
|
||||
generator = DocumentationGenerator(self.config)
|
||||
|
||||
for section_id, section_data in data.items():
|
||||
if section_data:
|
||||
try:
|
||||
logger.info(f"Generating documentation for section {section_id}")
|
||||
output_file = f"output/section_{section_id}.md"
|
||||
generator.generate(section_id, section_data, output_file)
|
||||
|
||||
# Validate generated document
|
||||
validator = DocumentValidator()
|
||||
if validator.validate(output_file):
|
||||
logger.info(f"✓ Section {section_id} generated and validated")
|
||||
self.sections.append(section_id)
|
||||
else:
|
||||
logger.warning(f"⚠ Section {section_id} validation warnings")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"✗ Failed to generate section {section_id}: {e}")
|
||||
|
||||
# Generate improvement section based on all other sections
|
||||
if len(self.sections) > 0:
|
||||
logger.info("Analyzing for improvements...")
|
||||
analyzer = ImprovementAnalyzer(self.config)
|
||||
improvements = analyzer.analyze(data)
|
||||
generator.generate('10', improvements, "output/section_10.md")
|
||||
|
||||
def run(self, section=None, dry_run=False):
|
||||
"""Main execution flow"""
|
||||
logger.info("=" * 60)
|
||||
logger.info("Starting Datacenter Documentation Generation")
|
||||
logger.info(f"Timestamp: {datetime.now().isoformat()}")
|
||||
logger.info("=" * 60)
|
||||
|
||||
try:
|
||||
# Collect data
|
||||
data = self.collect_data(section)
|
||||
|
||||
if dry_run:
|
||||
logger.info("DRY RUN - Data collection complete, skipping generation")
|
||||
return True
|
||||
|
||||
# Generate documentation
|
||||
self.generate_documentation(data)
|
||||
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"✓ Documentation generation completed successfully")
|
||||
logger.info(f"Sections updated: {', '.join(self.sections)}")
|
||||
logger.info("=" * 60)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.exception(f"Fatal error during documentation generation: {e}")
|
||||
return False
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Generate Datacenter Documentation')
|
||||
parser.add_argument('--section', help='Generate specific section only (01-10)')
|
||||
parser.add_argument('--dry-run', action='store_true', help='Collect data without generating docs')
|
||||
parser.add_argument('--config', default='config.yaml', help='Configuration file path')
|
||||
parser.add_argument('--debug', action='store_true', help='Enable debug logging')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.debug:
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
generator = DatacenterDocGenerator(args.config)
|
||||
success = generator.run(section=args.section, dry_run=args.dry_run)
|
||||
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Collector Modules
|
||||
|
||||
### 2.1 Infrastructure Collector
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
collectors/infrastructure.py - Raccolta dati infrastruttura fisica
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Dict
|
||||
import requests
|
||||
from pysnmp.hlapi import *
|
||||
|
||||
@dataclass
|
||||
class UPSData:
|
||||
id: str
|
||||
model: str
|
||||
power_kva: float
|
||||
battery_capacity: float
|
||||
autonomy_minutes: int
|
||||
status: str
|
||||
last_test: str
|
||||
|
||||
class InfrastructureCollector:
|
||||
def __init__(self, config):
|
||||
self.config = config
|
||||
self.asset_db = self.connect_asset_db()
|
||||
|
||||
def connect_asset_db(self):
|
||||
"""Connect to asset management database"""
|
||||
import mysql.connector
|
||||
return mysql.connector.connect(
|
||||
host=self.config['databases']['asset_db']['host'],
|
||||
user=self.config['databases']['asset_db']['user'],
|
||||
password=self.config['databases']['asset_db']['password'],
|
||||
database=self.config['databases']['asset_db']['database']
|
||||
)
|
||||
|
||||
def collect_ups_data(self) -> List[UPSData]:
|
||||
"""Collect UPS data via SNMP"""
|
||||
ups_devices = self.config['infrastructure']['ups_devices']
|
||||
ups_data = []
|
||||
|
||||
for ups in ups_devices:
|
||||
try:
|
||||
# Query UPS via SNMP
|
||||
iterator = getCmd(
|
||||
SnmpEngine(),
|
||||
CommunityData(self.config['snmp']['community']),
|
||||
UdpTransportTarget((ups['ip'], 161)),
|
||||
ContextData(),
|
||||
ObjectType(ObjectIdentity('UPS-MIB', 'upsIdentModel', 0)),
|
||||
ObjectType(ObjectIdentity('UPS-MIB', 'upsBatteryStatus', 0)),
|
||||
)
|
||||
|
||||
errorIndication, errorStatus, errorIndex, varBinds = next(iterator)
|
||||
|
||||
if errorIndication:
|
||||
logger.error(f"SNMP error for {ups['id']}: {errorIndication}")
|
||||
continue
|
||||
|
||||
# Parse SNMP response
|
||||
model = str(varBinds[0][1])
|
||||
status = str(varBinds[1][1])
|
||||
|
||||
ups_data.append(UPSData(
|
||||
id=ups['id'],
|
||||
model=model,
|
||||
power_kva=ups.get('power_kva', 0),
|
||||
battery_capacity=ups.get('battery_capacity', 0),
|
||||
autonomy_minutes=ups.get('autonomy_minutes', 0),
|
||||
status=status,
|
||||
last_test=ups.get('last_test', 'N/A')
|
||||
))
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to collect UPS {ups['id']}: {e}")
|
||||
|
||||
return ups_data
|
||||
|
||||
def collect_rack_data(self) -> List[Dict]:
|
||||
"""Collect rack inventory from asset database"""
|
||||
cursor = self.asset_db.cursor(dictionary=True)
|
||||
cursor.execute("""
|
||||
SELECT
|
||||
rack_id,
|
||||
location,
|
||||
total_units,
|
||||
occupied_units,
|
||||
max_power_kw
|
||||
FROM racks
|
||||
ORDER BY location, rack_id
|
||||
""")
|
||||
return cursor.fetchall()
|
||||
|
||||
def collect_environmental_sensors(self) -> List[Dict]:
|
||||
"""Collect temperature/humidity sensor data"""
|
||||
sensors_api = self.config['infrastructure']['sensors_api']
|
||||
response = requests.get(
|
||||
f"{sensors_api}/api/sensors/current",
|
||||
timeout=10
|
||||
)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def collect(self) -> Dict:
|
||||
"""Main collection method"""
|
||||
return {
|
||||
'ups_systems': self.collect_ups_data(),
|
||||
'racks': self.collect_rack_data(),
|
||||
'environmental': self.collect_environmental_sensors(),
|
||||
'cooling': self.collect_cooling_data(),
|
||||
'power_distribution': self.collect_pdu_data(),
|
||||
'timestamp': datetime.now().isoformat()
|
||||
}
|
||||
```
|
||||
|
||||
### 2.2 Network Collector
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
collectors/network.py - Raccolta configurazioni networking
|
||||
"""
|
||||
|
||||
from netmiko import ConnectHandler
|
||||
import paramiko
|
||||
|
||||
class NetworkCollector:
|
||||
def __init__(self, config):
|
||||
self.config = config
|
||||
|
||||
def connect_device(self, device_config):
|
||||
"""SSH connection to network device"""
|
||||
return ConnectHandler(
|
||||
device_type=device_config['type'],
|
||||
host=device_config['host'],
|
||||
username=device_config['username'],
|
||||
password=device_config['password'],
|
||||
secret=device_config.get('enable_password')
|
||||
)
|
||||
|
||||
def collect_switch_inventory(self) -> List[Dict]:
|
||||
"""Collect switch inventory and configuration"""
|
||||
switches = []
|
||||
|
||||
for switch_config in self.config['network']['switches']:
|
||||
try:
|
||||
connection = self.connect_device(switch_config)
|
||||
|
||||
# Collect basic info
|
||||
version = connection.send_command('show version')
|
||||
interfaces = connection.send_command('show interfaces status')
|
||||
vlan = connection.send_command('show vlan brief')
|
||||
|
||||
switches.append({
|
||||
'hostname': switch_config['hostname'],
|
||||
'version': self.parse_version(version),
|
||||
'interfaces': self.parse_interfaces(interfaces),
|
||||
'vlans': self.parse_vlans(vlan),
|
||||
})
|
||||
|
||||
connection.disconnect()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to collect {switch_config['hostname']}: {e}")
|
||||
|
||||
return switches
|
||||
|
||||
def collect_firewall_rules(self) -> Dict:
|
||||
"""Collect firewall configuration"""
|
||||
# Implementation depends on firewall vendor
|
||||
pass
|
||||
|
||||
def collect(self) -> Dict:
|
||||
"""Main collection method"""
|
||||
return {
|
||||
'switches': self.collect_switch_inventory(),
|
||||
'routers': self.collect_router_data(),
|
||||
'firewalls': self.collect_firewall_rules(),
|
||||
'vlans': self.collect_vlan_config(),
|
||||
'timestamp': datetime.now().isoformat()
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 VMware Collector
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
collectors/virtualization.py - Raccolta dati VMware/Hypervisor
|
||||
"""
|
||||
|
||||
from pyVim.connect import SmartConnect, Disconnect
|
||||
from pyVmomi import vim
|
||||
import ssl
|
||||
|
||||
class VirtualizationCollector:
|
||||
def __init__(self, config):
|
||||
self.config = config
|
||||
self.si = self.connect_vcenter()
|
||||
|
||||
def connect_vcenter(self):
|
||||
"""Connect to vCenter"""
|
||||
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
|
||||
context.verify_mode = ssl.CERT_NONE
|
||||
|
||||
return SmartConnect(
|
||||
host=self.config['vmware']['vcenter_host'],
|
||||
user=self.config['vmware']['username'],
|
||||
pwd=self.config['vmware']['password'],
|
||||
sslContext=context
|
||||
)
|
||||
|
||||
def collect_vm_inventory(self) -> List[Dict]:
|
||||
"""Collect all VMs"""
|
||||
content = self.si.RetrieveContent()
|
||||
container = content.rootFolder
|
||||
viewType = [vim.VirtualMachine]
|
||||
recursive = True
|
||||
|
||||
containerView = content.viewManager.CreateContainerView(
|
||||
container, viewType, recursive
|
||||
)
|
||||
|
||||
vms = []
|
||||
for vm in containerView.view:
|
||||
if vm.config:
|
||||
vms.append({
|
||||
'name': vm.name,
|
||||
'power_state': vm.runtime.powerState,
|
||||
'vcpu': vm.config.hardware.numCPU,
|
||||
'memory_mb': vm.config.hardware.memoryMB,
|
||||
'guest_os': vm.config.guestFullName,
|
||||
'host': vm.runtime.host.name if vm.runtime.host else 'N/A',
|
||||
'storage_gb': sum(d.capacityInBytes for d in vm.config.hardware.device
|
||||
if isinstance(d, vim.vm.device.VirtualDisk)) / 1024**3
|
||||
})
|
||||
|
||||
return vms
|
||||
|
||||
def collect_host_inventory(self) -> List[Dict]:
|
||||
"""Collect ESXi hosts"""
|
||||
content = self.si.RetrieveContent()
|
||||
hosts = []
|
||||
|
||||
for datacenter in content.rootFolder.childEntity:
|
||||
if hasattr(datacenter, 'hostFolder'):
|
||||
for cluster in datacenter.hostFolder.childEntity:
|
||||
for host in cluster.host:
|
||||
hosts.append({
|
||||
'name': host.name,
|
||||
'cluster': cluster.name,
|
||||
'cpu_cores': host.hardware.cpuInfo.numCpuCores,
|
||||
'memory_gb': host.hardware.memorySize / 1024**3,
|
||||
'cpu_usage': host.summary.quickStats.overallCpuUsage,
|
||||
'memory_usage': host.summary.quickStats.overallMemoryUsage,
|
||||
'vms_count': len(host.vm),
|
||||
'uptime': host.summary.quickStats.uptime,
|
||||
})
|
||||
|
||||
return hosts
|
||||
|
||||
def collect(self) -> Dict:
|
||||
"""Main collection method"""
|
||||
data = {
|
||||
'vms': self.collect_vm_inventory(),
|
||||
'hosts': self.collect_host_inventory(),
|
||||
'datastores': self.collect_datastore_info(),
|
||||
'clusters': self.collect_cluster_config(),
|
||||
'timestamp': datetime.now().isoformat()
|
||||
}
|
||||
|
||||
Disconnect(self.si)
|
||||
return data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Helper Functions
|
||||
|
||||
### 3.1 SNMP Utilities
|
||||
```python
|
||||
"""
|
||||
utils/snmp_helper.py
|
||||
"""
|
||||
|
||||
from pysnmp.hlapi import *
|
||||
|
||||
def snmp_get(target, oid, community='public'):
|
||||
"""Simple SNMP GET"""
|
||||
iterator = getCmd(
|
||||
SnmpEngine(),
|
||||
CommunityData(community),
|
||||
UdpTransportTarget((target, 161)),
|
||||
ContextData(),
|
||||
ObjectType(ObjectIdentity(oid))
|
||||
)
|
||||
|
||||
errorIndication, errorStatus, errorIndex, varBinds = next(iterator)
|
||||
|
||||
if errorIndication:
|
||||
raise Exception(f"SNMP Error: {errorIndication}")
|
||||
|
||||
return str(varBinds[0][1])
|
||||
|
||||
def snmp_walk(target, oid, community='public'):
|
||||
"""Simple SNMP WALK"""
|
||||
results = []
|
||||
|
||||
for (errorIndication, errorStatus, errorIndex, varBinds) in nextCmd(
|
||||
SnmpEngine(),
|
||||
CommunityData(community),
|
||||
UdpTransportTarget((target, 161)),
|
||||
ContextData(),
|
||||
ObjectType(ObjectIdentity(oid)),
|
||||
lexicographicMode=False
|
||||
):
|
||||
if errorIndication:
|
||||
break
|
||||
|
||||
for varBind in varBinds:
|
||||
results.append((str(varBind[0]), str(varBind[1])))
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
### 3.2 Token Counter
|
||||
```python
|
||||
"""
|
||||
utils/token_counter.py
|
||||
"""
|
||||
|
||||
def count_tokens(text):
|
||||
"""
|
||||
Stima approssimativa dei token
|
||||
1 token ≈ 4 caratteri in inglese
|
||||
"""
|
||||
return len(text) // 4
|
||||
|
||||
def count_file_tokens(file_path):
|
||||
"""Count tokens in a file"""
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
content = f.read()
|
||||
return count_tokens(content)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Configuration File Example
|
||||
|
||||
### 4.1 config.yaml
|
||||
```yaml
|
||||
# Configuration file for datacenter documentation generator
|
||||
|
||||
# Database connections
|
||||
databases:
|
||||
asset_db:
|
||||
host: db.company.local
|
||||
port: 3306
|
||||
user: readonly_user
|
||||
password: ${VAULT:asset_db_password}
|
||||
database: asset_management
|
||||
|
||||
# Infrastructure
|
||||
infrastructure:
|
||||
ups_devices:
|
||||
- id: UPS-01
|
||||
ip: 10.0.10.10
|
||||
power_kva: 100
|
||||
- id: UPS-02
|
||||
ip: 10.0.10.11
|
||||
power_kva: 100
|
||||
|
||||
sensors_api: http://sensors.company.local
|
||||
|
||||
# Network devices
|
||||
network:
|
||||
switches:
|
||||
- hostname: core-sw-01
|
||||
host: 10.0.10.20
|
||||
type: cisco_ios
|
||||
username: readonly
|
||||
password: ${VAULT:network_password}
|
||||
|
||||
# VMware
|
||||
vmware:
|
||||
vcenter_host: vcenter.company.local
|
||||
username: automation@vsphere.local
|
||||
password: ${VAULT:vmware_password}
|
||||
|
||||
# SNMP
|
||||
snmp:
|
||||
community: ${VAULT:snmp_community}
|
||||
version: 2c
|
||||
|
||||
# Output
|
||||
output:
|
||||
directory: /opt/datacenter-docs/output
|
||||
format: markdown
|
||||
|
||||
# Thresholds
|
||||
thresholds:
|
||||
cpu_warning: 80
|
||||
cpu_critical: 90
|
||||
memory_warning: 85
|
||||
memory_critical: 95
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Deployment Script
|
||||
|
||||
### 5.1 deploy.sh
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Deploy datacenter documentation generator
|
||||
|
||||
set -e
|
||||
|
||||
INSTALL_DIR="/opt/datacenter-docs"
|
||||
VENV_DIR="$INSTALL_DIR/venv"
|
||||
LOG_DIR="/var/log/datacenter-docs"
|
||||
|
||||
echo "Installing datacenter documentation generator..."
|
||||
|
||||
# Create directories
|
||||
mkdir -p $INSTALL_DIR
|
||||
mkdir -p $LOG_DIR
|
||||
mkdir -p $INSTALL_DIR/output
|
||||
|
||||
# Create virtual environment
|
||||
python3 -m venv $VENV_DIR
|
||||
source $VENV_DIR/bin/activate
|
||||
|
||||
# Install dependencies
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Copy files
|
||||
cp -r collectors $INSTALL_DIR/
|
||||
cp -r generators $INSTALL_DIR/
|
||||
cp -r validators $INSTALL_DIR/
|
||||
cp -r templates $INSTALL_DIR/
|
||||
cp main.py $INSTALL_DIR/
|
||||
cp config.yaml $INSTALL_DIR/
|
||||
|
||||
# Set permissions
|
||||
chown -R automation:automation $INSTALL_DIR
|
||||
chmod +x $INSTALL_DIR/main.py
|
||||
|
||||
# Install cron job
|
||||
cat > /etc/cron.d/datacenter-docs << 'CRON'
|
||||
# Datacenter documentation generation
|
||||
0 */6 * * * automation /opt/datacenter-docs/venv/bin/python /opt/datacenter-docs/main.py >> /var/log/datacenter-docs/cron.log 2>&1
|
||||
CRON
|
||||
|
||||
echo "✓ Installation complete!"
|
||||
echo "Run: cd $INSTALL_DIR && source venv/bin/activate && python main.py --help"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Testing Framework
|
||||
|
||||
### 6.1 test_collectors.py
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
tests/test_collectors.py
|
||||
"""
|
||||
|
||||
import unittest
|
||||
from unittest.mock import Mock, patch
|
||||
from collectors.infrastructure import InfrastructureCollector
|
||||
|
||||
class TestInfrastructureCollector(unittest.TestCase):
|
||||
def setUp(self):
|
||||
self.config = {
|
||||
'databases': {'asset_db': {...}},
|
||||
'snmp': {'community': 'public'}
|
||||
}
|
||||
self.collector = InfrastructureCollector(self.config)
|
||||
|
||||
@patch('mysql.connector.connect')
|
||||
def test_asset_db_connection(self, mock_connect):
|
||||
"""Test database connection"""
|
||||
mock_connect.return_value = Mock()
|
||||
db = self.collector.connect_asset_db()
|
||||
self.assertIsNotNone(db)
|
||||
|
||||
def test_ups_data_collection(self):
|
||||
"""Test UPS data collection"""
|
||||
# Mock SNMP responses
|
||||
ups_data = self.collector.collect_ups_data()
|
||||
self.assertIsInstance(ups_data, list)
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Documento Versione**: 1.0
|
||||
**Per Supporto**: automation-team@company.com
|
||||
531
requirements/llm_requirements.md
Normal file
531
requirements/llm_requirements.md
Normal file
@@ -0,0 +1,531 @@
|
||||
# Requisiti Tecnici per LLM - Generazione Documentazione Datacenter
|
||||
|
||||
## 1. Capacità Richieste al LLM
|
||||
|
||||
### 1.1 Capabilities Fondamentali
|
||||
- **Network Access**: Connessioni SSH, HTTPS, SNMP
|
||||
- **API Interaction**: REST, SOAP, GraphQL
|
||||
- **Code Execution**: Python, Bash, PowerShell
|
||||
- **File Operations**: Lettura/scrittura file markdown
|
||||
- **Database Access**: MySQL, PostgreSQL, SQL Server
|
||||
|
||||
### 1.2 Librerie Python Richieste
|
||||
```python
|
||||
# Networking e protocolli
|
||||
pip install paramiko # SSH connections
|
||||
pip install pysnmp # SNMP queries
|
||||
pip install requests # HTTP/REST APIs
|
||||
pip install netmiko # Network device automation
|
||||
|
||||
# Virtualizzazione
|
||||
pip install pyvmomi # VMware vSphere API
|
||||
pip install proxmoxer # Proxmox API
|
||||
pip install libvirt-python # KVM/QEMU
|
||||
|
||||
# Storage
|
||||
pip install pure-storage # Pure Storage API
|
||||
pip install netapp-ontap # NetApp API
|
||||
|
||||
# Database
|
||||
pip install mysql-connector-python
|
||||
pip install psycopg2 # PostgreSQL
|
||||
pip install pymssql # Microsoft SQL Server
|
||||
|
||||
# Monitoring
|
||||
pip install zabbix-api # Zabbix
|
||||
pip install prometheus-client # Prometheus
|
||||
|
||||
# Cloud providers
|
||||
pip install boto3 # AWS
|
||||
pip install azure-mgmt # Azure
|
||||
pip install google-cloud # GCP
|
||||
|
||||
# Utilities
|
||||
pip install jinja2 # Template rendering
|
||||
pip install pyyaml # YAML parsing
|
||||
pip install pandas # Data analysis
|
||||
pip install markdown # Markdown generation
|
||||
```
|
||||
|
||||
### 1.3 CLI Tools Required
|
||||
```bash
|
||||
# Network tools
|
||||
apt-get install snmp snmp-mibs-downloader
|
||||
apt-get install nmap
|
||||
apt-get install netcat-openbsd
|
||||
|
||||
# Virtualization
|
||||
apt-get install open-vm-tools # VMware
|
||||
|
||||
# Monitoring
|
||||
apt-get install nagios-plugins
|
||||
|
||||
# Storage
|
||||
apt-get install nfs-common
|
||||
apt-get install cifs-utils
|
||||
apt-get install multipath-tools
|
||||
|
||||
# Database clients
|
||||
apt-get install mysql-client
|
||||
apt-get install postgresql-client
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Accessi e Credenziali Necessarie
|
||||
|
||||
### 2.1 Formato Credenziali
|
||||
Le credenziali devono essere fornite in un file sicuro (vault/encrypted):
|
||||
|
||||
```yaml
|
||||
# credentials.yaml (encrypted)
|
||||
datacenter:
|
||||
|
||||
# Network devices
|
||||
network:
|
||||
cisco_switches:
|
||||
username: admin
|
||||
password: ${ENCRYPTED}
|
||||
enable_password: ${ENCRYPTED}
|
||||
firewalls:
|
||||
api_key: ${ENCRYPTED}
|
||||
|
||||
# Virtualization
|
||||
vmware:
|
||||
vcenter_host: vcenter.domain.local
|
||||
username: automation@vsphere.local
|
||||
password: ${ENCRYPTED}
|
||||
|
||||
proxmox:
|
||||
host: proxmox.domain.local
|
||||
token_name: automation
|
||||
token_value: ${ENCRYPTED}
|
||||
|
||||
# Storage
|
||||
storage_arrays:
|
||||
- name: SAN-01
|
||||
type: pure_storage
|
||||
api_token: ${ENCRYPTED}
|
||||
|
||||
# Databases
|
||||
databases:
|
||||
asset_management:
|
||||
host: db.domain.local
|
||||
port: 3306
|
||||
username: readonly_user
|
||||
password: ${ENCRYPTED}
|
||||
database: asset_db
|
||||
|
||||
# Monitoring
|
||||
monitoring:
|
||||
zabbix:
|
||||
url: https://zabbix.domain.local
|
||||
api_token: ${ENCRYPTED}
|
||||
|
||||
# Backup
|
||||
backup:
|
||||
veeam:
|
||||
server: veeam.domain.local
|
||||
username: automation
|
||||
password: ${ENCRYPTED}
|
||||
```
|
||||
|
||||
### 2.2 Permessi Minimi Richiesti
|
||||
**IMPORTANTE**: Utilizzare SEMPRE account a permessi minimi (read-only dove possibile)
|
||||
|
||||
| Sistema | Account Type | Permessi Richiesti |
|
||||
|---------|-------------|-------------------|
|
||||
| Network Devices | Read-only | show commands, SNMP read |
|
||||
| VMware vCenter | Read-only | Global > Read-only role |
|
||||
| Storage Arrays | Read-only | Monitoring/reporting access |
|
||||
| Databases | SELECT only | Read access su schema asset |
|
||||
| Monitoring | Read-only | View dashboards, metrics |
|
||||
| Backup Software | Read-only | View jobs, reports |
|
||||
|
||||
---
|
||||
|
||||
## 3. Connettività di Rete
|
||||
|
||||
### 3.1 Requisiti Rete
|
||||
```
|
||||
LLM Host deve poter raggiungere:
|
||||
|
||||
Management Network:
|
||||
- VLAN 10: 10.0.10.0/24 (Infrastructure Management)
|
||||
- VLAN 20: 10.0.20.0/24 (Server Management)
|
||||
- VLAN 30: 10.0.30.0/24 (Storage Management)
|
||||
|
||||
Porte richieste:
|
||||
- TCP 22 (SSH)
|
||||
- TCP 443 (HTTPS)
|
||||
- TCP 3306 (MySQL)
|
||||
- TCP 5432 (PostgreSQL)
|
||||
- TCP 1433 (MS SQL Server)
|
||||
- UDP 161 (SNMP)
|
||||
- TCP 8006 (Proxmox)
|
||||
```
|
||||
|
||||
### 3.2 Firewall Rules
|
||||
```
|
||||
# Allow LLM host to management networks
|
||||
Source: [LLM_HOST_IP]
|
||||
Destination: Management Networks
|
||||
Protocol: SSH, HTTPS, SNMP, Database ports
|
||||
Action: ALLOW
|
||||
|
||||
# Deny all other traffic from LLM host
|
||||
Source: [LLM_HOST_IP]
|
||||
Destination: Production Networks
|
||||
Action: DENY
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Rate Limiting e Best Practices
|
||||
|
||||
### 4.1 API Call Limits
|
||||
```python
|
||||
# Rispettare rate limits dei vendor
|
||||
RATE_LIMITS = {
|
||||
'vmware_vcenter': {'calls_per_minute': 100},
|
||||
'network_devices': {'calls_per_minute': 10},
|
||||
'storage_api': {'calls_per_minute': 60},
|
||||
'monitoring_api': {'calls_per_minute': 300}
|
||||
}
|
||||
|
||||
# Implementare retry logic con exponential backoff
|
||||
import time
|
||||
from functools import wraps
|
||||
|
||||
def retry_with_backoff(max_retries=3, base_delay=1):
|
||||
def decorator(func):
|
||||
@wraps(func)
|
||||
def wrapper(*args, **kwargs):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return func(*args, **kwargs)
|
||||
except Exception as e:
|
||||
if attempt == max_retries - 1:
|
||||
raise
|
||||
delay = base_delay * (2 ** attempt)
|
||||
time.sleep(delay)
|
||||
return wrapper
|
||||
return decorator
|
||||
```
|
||||
|
||||
### 4.2 Concurrent Operations
|
||||
```python
|
||||
# Limitare operazioni concorrenti
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
MAX_WORKERS = 5 # Non saturare le risorse
|
||||
|
||||
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
|
||||
futures = [executor.submit(query_device, device) for device in devices]
|
||||
results = [f.result() for f in futures]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Error Handling e Logging
|
||||
|
||||
### 5.1 Logging Configuration
|
||||
```python
|
||||
import logging
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.FileHandler('/var/log/datacenter-docs/generation.log'),
|
||||
logging.StreamHandler()
|
||||
]
|
||||
)
|
||||
|
||||
logger = logging.getLogger('datacenter-docs')
|
||||
```
|
||||
|
||||
### 5.2 Error Handling Strategy
|
||||
```python
|
||||
class DataCollectionError(Exception):
|
||||
"""Custom exception per errori di raccolta dati"""
|
||||
pass
|
||||
|
||||
try:
|
||||
data = collect_vmware_data()
|
||||
except ConnectionError as e:
|
||||
logger.error(f"Cannot connect to vCenter: {e}")
|
||||
# Utilizzare dati cached se disponibili
|
||||
data = load_cached_data('vmware')
|
||||
except AuthenticationError as e:
|
||||
logger.critical(f"Authentication failed: {e}")
|
||||
# Inviare alert al team
|
||||
send_alert("VMware auth failed")
|
||||
except Exception as e:
|
||||
logger.exception(f"Unexpected error: {e}")
|
||||
# Continuare con dati parziali
|
||||
data = get_partial_data()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Caching e Performance
|
||||
|
||||
### 6.1 Cache Strategy
|
||||
```python
|
||||
import redis
|
||||
from datetime import timedelta
|
||||
|
||||
# Setup Redis per caching
|
||||
cache = redis.Redis(host='localhost', port=6379, db=0)
|
||||
|
||||
def get_cached_or_fetch(key, fetch_function, ttl=3600):
|
||||
"""Get from cache or fetch if not available"""
|
||||
cached = cache.get(key)
|
||||
if cached:
|
||||
logger.info(f"Cache hit for {key}")
|
||||
return json.loads(cached)
|
||||
|
||||
logger.info(f"Cache miss for {key}, fetching...")
|
||||
data = fetch_function()
|
||||
cache.setex(key, ttl, json.dumps(data))
|
||||
return data
|
||||
|
||||
# Esempio uso
|
||||
vmware_inventory = get_cached_or_fetch(
|
||||
'vmware_inventory',
|
||||
lambda: collect_vmware_inventory(),
|
||||
ttl=3600 # 1 hour
|
||||
)
|
||||
```
|
||||
|
||||
### 6.2 Dati da Cachare
|
||||
- **1 ora**: Performance metrics, status real-time
|
||||
- **6 ore**: Inventory, configurazioni
|
||||
- **24 ore**: Asset database, ownership info
|
||||
- **7 giorni**: Historical trends, capacity planning
|
||||
|
||||
---
|
||||
|
||||
## 7. Schedule di Esecuzione
|
||||
|
||||
### 7.1 Cron Schedule Raccomandato
|
||||
```cron
|
||||
# Aggiornamento documentazione completa - ogni 6 ore
|
||||
0 */6 * * * /usr/local/bin/generate-datacenter-docs.sh --full
|
||||
|
||||
# Quick update (solo metrics) - ogni ora
|
||||
0 * * * * /usr/local/bin/generate-datacenter-docs.sh --metrics-only
|
||||
|
||||
# Weekly comprehensive report - domenica notte
|
||||
0 2 * * 0 /usr/local/bin/generate-datacenter-docs.sh --full --detailed
|
||||
```
|
||||
|
||||
### 7.2 Script Wrapper Esempio
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# generate-datacenter-docs.sh
|
||||
|
||||
set -e
|
||||
|
||||
LOGFILE="/var/log/datacenter-docs/$(date +%Y%m%d_%H%M%S).log"
|
||||
LOCKFILE="/var/run/datacenter-docs.lock"
|
||||
|
||||
# Prevent concurrent executions
|
||||
if [ -f "$LOCKFILE" ]; then
|
||||
echo "Another instance is running. Exiting."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
touch "$LOCKFILE"
|
||||
trap "rm -f $LOCKFILE" EXIT
|
||||
|
||||
# Activate virtual environment
|
||||
source /opt/datacenter-docs/venv/bin/activate
|
||||
|
||||
# Run Python script with parameters
|
||||
python3 /opt/datacenter-docs/main.py "$@" 2>&1 | tee -a "$LOGFILE"
|
||||
|
||||
# Cleanup old logs (keep 30 days)
|
||||
find /var/log/datacenter-docs/ -name "*.log" -mtime +30 -delete
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Output e Validazione
|
||||
|
||||
### 8.1 Post-Generation Checks
|
||||
```python
|
||||
def validate_documentation(section_file):
|
||||
"""Valida il documento generato"""
|
||||
|
||||
checks = {
|
||||
'file_exists': os.path.exists(section_file),
|
||||
'not_empty': os.path.getsize(section_file) > 0,
|
||||
'valid_markdown': validate_markdown_syntax(section_file),
|
||||
'no_placeholders': not contains_placeholders(section_file),
|
||||
'token_limit': count_tokens(section_file) < 50000
|
||||
}
|
||||
|
||||
if all(checks.values()):
|
||||
logger.info(f"✓ {section_file} validation passed")
|
||||
return True
|
||||
else:
|
||||
failed = [k for k, v in checks.items() if not v]
|
||||
logger.error(f"✗ {section_file} validation failed: {failed}")
|
||||
return False
|
||||
|
||||
def contains_placeholders(file_path):
|
||||
"""Check per placeholders non sostituiti"""
|
||||
with open(file_path, 'r') as f:
|
||||
content = f.read()
|
||||
patterns = [r'\[.*?\]', r'\{.*?\}', r'TODO', r'FIXME']
|
||||
import re
|
||||
return any(re.search(p, content) for p in patterns)
|
||||
```
|
||||
|
||||
### 8.2 Notification System
|
||||
```python
|
||||
def send_completion_notification(success, sections_updated, errors):
|
||||
"""Invia notifica a fine generazione"""
|
||||
|
||||
message = f"""
|
||||
Datacenter Documentation Update
|
||||
|
||||
Status: {'✓ SUCCESS' if success else '✗ FAILED'}
|
||||
Sections Updated: {', '.join(sections_updated)}
|
||||
Errors: {len(errors)}
|
||||
|
||||
{'Errors:\n' + '\n'.join(errors) if errors else ''}
|
||||
|
||||
Timestamp: {datetime.now().isoformat()}
|
||||
"""
|
||||
|
||||
# Send via multiple channels
|
||||
send_email(recipients=['ops-team@company.com'], subject='Doc Update', body=message)
|
||||
send_slack(channel='#datacenter-ops', message=message)
|
||||
# send_teams / send_webhook as needed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Security Considerations
|
||||
|
||||
### 9.1 Secrets Management
|
||||
```python
|
||||
# NON salvare mai credenziali in chiaro
|
||||
# Utilizzare sempre un vault
|
||||
|
||||
from cryptography.fernet import Fernet
|
||||
import keyring
|
||||
|
||||
def get_credential(service, account):
|
||||
"""Retrieve credential from OS keyring"""
|
||||
return keyring.get_password(service, account)
|
||||
|
||||
# Oppure HashiCorp Vault
|
||||
import hvac
|
||||
|
||||
client = hvac.Client(url='https://vault.company.com')
|
||||
client.auth.approle.login(role_id=ROLE_ID, secret_id=SECRET_ID)
|
||||
credentials = client.secrets.kv.v2.read_secret_version(path='datacenter/creds')
|
||||
```
|
||||
|
||||
### 9.2 Audit Trail
|
||||
```python
|
||||
# Log TUTTE le operazioni per audit
|
||||
audit_log = {
|
||||
'timestamp': datetime.now().isoformat(),
|
||||
'user': 'automation-account',
|
||||
'action': 'documentation_generation',
|
||||
'sections': sections_updated,
|
||||
'systems_accessed': list_of_systems,
|
||||
'duration': elapsed_time,
|
||||
'success': True/False
|
||||
}
|
||||
|
||||
write_audit_log(audit_log)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Troubleshooting
|
||||
|
||||
### 10.1 Common Issues
|
||||
|
||||
| Problema | Causa Probabile | Soluzione |
|
||||
|----------|----------------|-----------|
|
||||
| Connection Timeout | Firewall/Network | Verificare connectivity, firewall rules |
|
||||
| Authentication Failed | Credenziali errate/scadute | Ruotare credenziali, verificare vault |
|
||||
| API Rate Limit | Troppe richieste | Implementare backoff, ridurre frequency |
|
||||
| Incomplete Data | Source temporaneamente down | Usare cached data, generare partial doc |
|
||||
| Token Limit Exceeded | Troppi dati in sezione | Rimuovere dati storici, ottimizzare formato |
|
||||
|
||||
### 10.2 Debug Mode
|
||||
```python
|
||||
# Abilitare debug per troubleshooting
|
||||
DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
|
||||
|
||||
if DEBUG:
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
# Salvare raw responses per analisi
|
||||
with open(f'debug_{timestamp}.json', 'w') as f:
|
||||
json.dump(raw_response, f, indent=2)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Testing
|
||||
|
||||
### 11.1 Unit Tests
|
||||
```python
|
||||
import unittest
|
||||
|
||||
class TestDataCollection(unittest.TestCase):
|
||||
def test_vmware_connection(self):
|
||||
"""Test connessione a vCenter"""
|
||||
result = test_vmware_connection()
|
||||
self.assertTrue(result.success)
|
||||
|
||||
def test_data_validation(self):
|
||||
"""Test validazione dati raccolti"""
|
||||
sample_data = load_sample_data()
|
||||
self.assertTrue(validate_data_structure(sample_data))
|
||||
```
|
||||
|
||||
### 11.2 Integration Tests
|
||||
```bash
|
||||
# Test end-to-end in ambiente di test
|
||||
./run-tests.sh --integration --environment=test
|
||||
|
||||
# Verificare che tutti i sistemi siano raggiungibili
|
||||
./check-connectivity.sh
|
||||
|
||||
# Dry-run senza salvare
|
||||
python3 main.py --dry-run --verbose
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist Pre-Deployment
|
||||
|
||||
Prima di mettere in produzione il sistema:
|
||||
|
||||
- [ ] Tutte le librerie installate
|
||||
- [ ] Credenziali configurate in vault sicuro
|
||||
- [ ] Connectivity verificata verso tutti i sistemi
|
||||
- [ ] Permessi account automation validati (read-only)
|
||||
- [ ] Firewall rules approvate e configurate
|
||||
- [ ] Logging configurato e testato
|
||||
- [ ] Notification system testato
|
||||
- [ ] Cron jobs configurati
|
||||
- [ ] Backup documentazione esistente
|
||||
- [ ] Runbook operativo completato
|
||||
- [ ] Escalation path definito
|
||||
- [ ] DR procedure documentate
|
||||
|
||||
---
|
||||
|
||||
**Documento Versione**: 1.0
|
||||
**Ultimo Aggiornamento**: 2025-01-XX
|
||||
**Owner**: Automation Team
|
||||
Reference in New Issue
Block a user