Files
llm-automation-docs-and-rem…/DEPLOYMENT_GUIDE.md
LLM Automation System 1ba5ce851d Initial commit: LLM Automation Docs & Remediation Engine v2.0
Features:
- Automated datacenter documentation generation
- MCP integration for device connectivity
- Auto-remediation engine with safety checks
- Multi-factor reliability scoring (0-100%)
- Human feedback learning loop
- Pattern recognition and continuous improvement
- Agentic chat support with AI
- API for ticket resolution
- Frontend React with Material-UI
- CI/CD pipelines (GitLab + Gitea)
- Docker & Kubernetes deployment
- Complete documentation and guides

v2.0 Highlights:
- Auto-remediation with write operations (disabled by default)
- Reliability calculator with 4-factor scoring
- Human feedback system for continuous learning
- Pattern-based progressive automation
- Approval workflow for critical actions
- Full audit trail and rollback capability
2025-10-17 23:47:28 +00:00

444 lines
8.8 KiB
Markdown

# 🚀 Deployment Guide - Datacenter Documentation System
## Quick Deploy Options
### Option 1: Docker Compose (Recommended for Development/Small Scale)
```bash
# 1. Clone repository
git clone https://git.company.local/infrastructure/datacenter-docs.git
cd datacenter-docs
# 2. Configure environment
cp .env.example .env
nano .env # Edit with your credentials
# 3. Start all services
docker-compose up -d
# 4. Check health
curl http://localhost:8000/health
# 5. Access services
# API: http://localhost:8000/api/docs
# Chat: http://localhost:8001
# Frontend: http://localhost
# Flower: http://localhost:5555
```
### Option 2: Kubernetes (Production)
```bash
# 1. Create namespace
kubectl apply -f deploy/kubernetes/namespace.yaml
# 2. Create secrets
kubectl create secret generic datacenter-secrets \
--from-literal=database-url='postgresql://user:pass@host:5432/db' \
--from-literal=redis-url='redis://:pass@host:6379/0' \
--from-literal=mcp-api-key='your-mcp-key' \
--from-literal=anthropic-api-key='your-claude-key' \
-n datacenter-docs
# 3. Create configmap
kubectl create configmap datacenter-config \
--from-literal=mcp-server-url='https://mcp.company.local' \
-n datacenter-docs
# 4. Deploy services
kubectl apply -f deploy/kubernetes/deployment.yaml
kubectl apply -f deploy/kubernetes/service.yaml
kubectl apply -f deploy/kubernetes/ingress.yaml
# 5. Check deployment
kubectl get pods -n datacenter-docs
kubectl logs -n datacenter-docs deployment/api
```
### Option 3: GitLab CI/CD (Automated)
```bash
# 1. Push to GitLab
git push origin main
# 2. Pipeline runs automatically:
# - Lint & Test
# - Build Docker images
# - Deploy to staging (manual approval)
# - Deploy to production (manual, on tags)
# 3. Monitor pipeline
# Visit: https://gitlab.company.local/infrastructure/datacenter-docs/-/pipelines
```
### Option 4: Gitea Actions (Automated)
```bash
# 1. Push to Gitea
git push origin main
# 2. Workflow triggers:
# - On push: Build & deploy to staging
# - On tag: Deploy to production
# - On schedule: Generate docs every 6h
# 3. Monitor workflow
# Visit: https://gitea.company.local/infrastructure/datacenter-docs/actions
```
---
## Configuration Details
### Environment Variables (.env)
```bash
# Database
DATABASE_URL=postgresql://docs_user:CHANGE_ME@postgres:5432/datacenter_docs
# Redis
REDIS_URL=redis://:CHANGE_ME@redis:6379/0
# MCP Server (CRITICAL - Required for device connectivity)
MCP_SERVER_URL=https://mcp.company.local
MCP_API_KEY=your_mcp_api_key_here
# Anthropic Claude API (CRITICAL - Required for AI)
ANTHROPIC_API_KEY=sk-ant-api03-xxxxx
# CORS (Adjust for your domain)
CORS_ORIGINS=http://localhost:3000,https://docs.company.local
# Optional
LOG_LEVEL=INFO
DEBUG=false
WORKERS=4
MAX_TOKENS=4096
```
### Kubernetes Secrets (secrets.yaml)
```yaml
apiVersion: v1
kind: Secret
metadata:
name: datacenter-secrets
namespace: datacenter-docs
type: Opaque
stringData:
database-url: "postgresql://user:pass@postgresql.default:5432/datacenter_docs"
redis-url: "redis://:pass@redis.default:6379/0"
mcp-api-key: "your-mcp-key"
anthropic-api-key: "sk-ant-api03-xxxxx"
```
---
## Post-Deployment Steps
### 1. Database Migrations
```bash
# Docker Compose
docker-compose exec api poetry run alembic upgrade head
# Kubernetes
kubectl exec -n datacenter-docs deployment/api -- \
poetry run alembic upgrade head
```
### 2. Index Initial Documentation
```bash
# Docker Compose
docker-compose exec api poetry run datacenter-docs index-docs \
--path /app/output
# Kubernetes
kubectl exec -n datacenter-docs deployment/api -- \
poetry run datacenter-docs index-docs --path /app/output
```
### 3. Generate Documentation
```bash
# Manual trigger
curl -X POST http://localhost:8000/api/v1/documentation/generate/infrastructure
# Or run full generation
docker-compose exec worker poetry run datacenter-docs generate-all
```
### 4. Test API
```bash
# Health check
curl http://localhost:8000/health
# Create test ticket
curl -X POST http://localhost:8000/api/v1/tickets \
-H "Content-Type: application/json" \
-d '{
"ticket_id": "TEST-001",
"title": "Test ticket",
"description": "Testing auto-resolution",
"category": "network"
}'
# Get ticket status
curl http://localhost:8000/api/v1/tickets/TEST-001
# Search documentation
curl -X POST http://localhost:8000/api/v1/documentation/search \
-H "Content-Type: application/json" \
-d '{"query": "UPS battery status", "limit": 5}'
```
---
## Monitoring
### Prometheus Metrics
```bash
# Metrics endpoint
curl http://localhost:8000/metrics
# Example metrics:
# datacenter_docs_tickets_total
# datacenter_docs_tickets_resolved_total
# datacenter_docs_resolution_confidence_score
# datacenter_docs_processing_time_seconds
```
### Grafana Dashboards
Import dashboard from: `deploy/grafana/dashboard.json`
### Logs
```bash
# Docker Compose
docker-compose logs -f api chat worker
# Kubernetes
kubectl logs -n datacenter-docs deployment/api -f
kubectl logs -n datacenter-docs deployment/chat -f
kubectl logs -n datacenter-docs deployment/worker -f
```
### Celery Flower (Task Monitoring)
Access: http://localhost:5555 (Docker Compose) or https://docs.company.local/flower (K8s)
---
## Scaling
### Horizontal Scaling
```bash
# Docker Compose (increase replicas in docker-compose.yml)
docker-compose up -d --scale worker=5
# Kubernetes
kubectl scale deployment api --replicas=5 -n datacenter-docs
kubectl scale deployment worker --replicas=10 -n datacenter-docs
```
### Vertical Scaling
Edit resource limits in `deploy/kubernetes/deployment.yaml`:
```yaml
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
```
---
## Troubleshooting
### API not starting
```bash
# Check logs
docker-compose logs api
# Common issues:
# - Database not accessible
# - Missing environment variables
# - MCP server not reachable
# Test database connection
docker-compose exec api python -c "
from datacenter_docs.utils.database import get_db
next(get_db())
print('DB OK')
"
```
### Chat not connecting
```bash
# Check WebSocket connection
# Browser console should show: WebSocket connection established
# Test from curl
curl -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
http://localhost:8001/socket.io/
```
### Worker not processing jobs
```bash
# Check Celery status
docker-compose exec worker celery -A datacenter_docs.workers.celery_app status
# Check Redis connection
docker-compose exec worker python -c "
import redis
r = redis.from_url('redis://:pass@redis:6379/0')
print(r.ping())
"
```
### MCP Connection Issues
```bash
# Test MCP connectivity
docker-compose exec api python -c "
import asyncio
from datacenter_docs.mcp.client import MCPClient
async def test():
async with MCPClient(
server_url='https://mcp.company.local',
api_key='your-key'
) as client:
resources = await client.list_resources()
print(f'Found {len(resources)} resources')
asyncio.run(test())
"
```
---
## Backup & Recovery
### Database Backup
```bash
# Docker Compose
docker-compose exec postgres pg_dump -U docs_user datacenter_docs > backup.sql
# Kubernetes
kubectl exec -n datacenter-docs postgresql-0 -- \
pg_dump -U docs_user datacenter_docs > backup.sql
```
### Documentation Backup
```bash
# Backup generated docs
tar -czf docs-backup-$(date +%Y%m%d).tar.gz output/
# Backup vector store
tar -czf vectordb-backup-$(date +%Y%m%d).tar.gz data/chroma_db/
```
### Restore
```bash
# Database
docker-compose exec -T postgres psql -U docs_user datacenter_docs < backup.sql
# Documentation
tar -xzf docs-backup-20250115.tar.gz
tar -xzf vectordb-backup-20250115.tar.gz
```
---
## Security Checklist
- [ ] All secrets stored in vault/secrets manager
- [ ] TLS enabled for all services
- [ ] API rate limiting configured
- [ ] CORS properly configured
- [ ] Network policies applied (K8s)
- [ ] Regular security scans scheduled
- [ ] Audit logging enabled
- [ ] Backup encryption enabled
---
## Performance Tuning
### API Optimization
```python
# Increase workers (in .env)
WORKERS=8 # 2x CPU cores
# Adjust max tokens
MAX_TOKENS=8192 # Higher for complex queries
```
### Database Optimization
```sql
-- Add indexes
CREATE INDEX idx_tickets_status ON tickets(status);
CREATE INDEX idx_tickets_created_at ON tickets(created_at);
```
### Redis Caching
```python
# Adjust cache TTL (in code)
CACHE_TTL = {
'documentation': 3600, # 1 hour
'metrics': 300, # 5 minutes
'tickets': 60 # 1 minute
}
```
---
## Maintenance
### Regular Tasks
```bash
# Weekly
- Review and clean old logs
- Check disk usage
- Review failed tickets
- Update dependencies
# Monthly
- Database vacuum/optimize
- Security patches
- Performance review
- Backup verification
```
### Scheduled Maintenance
```bash
# Schedule in crontab
0 2 * * 0 /opt/scripts/weekly-maintenance.sh
0 3 1 * * /opt/scripts/monthly-maintenance.sh
```
---
**For support**: automation-team@company.local