Files
llm-automation-docs-and-rem…/DEPLOYMENT_GUIDE.md
LLM Automation System 1ba5ce851d Initial commit: LLM Automation Docs & Remediation Engine v2.0
Features:
- Automated datacenter documentation generation
- MCP integration for device connectivity
- Auto-remediation engine with safety checks
- Multi-factor reliability scoring (0-100%)
- Human feedback learning loop
- Pattern recognition and continuous improvement
- Agentic chat support with AI
- API for ticket resolution
- Frontend React with Material-UI
- CI/CD pipelines (GitLab + Gitea)
- Docker & Kubernetes deployment
- Complete documentation and guides

v2.0 Highlights:
- Auto-remediation with write operations (disabled by default)
- Reliability calculator with 4-factor scoring
- Human feedback system for continuous learning
- Pattern-based progressive automation
- Approval workflow for critical actions
- Full audit trail and rollback capability
2025-10-17 23:47:28 +00:00

8.8 KiB

🚀 Deployment Guide - Datacenter Documentation System

Quick Deploy Options

# 1. Clone repository
git clone https://git.company.local/infrastructure/datacenter-docs.git
cd datacenter-docs

# 2. Configure environment
cp .env.example .env
nano .env  # Edit with your credentials

# 3. Start all services
docker-compose up -d

# 4. Check health
curl http://localhost:8000/health

# 5. Access services
# API: http://localhost:8000/api/docs
# Chat: http://localhost:8001
# Frontend: http://localhost
# Flower: http://localhost:5555

Option 2: Kubernetes (Production)

# 1. Create namespace
kubectl apply -f deploy/kubernetes/namespace.yaml

# 2. Create secrets
kubectl create secret generic datacenter-secrets \
  --from-literal=database-url='postgresql://user:pass@host:5432/db' \
  --from-literal=redis-url='redis://:pass@host:6379/0' \
  --from-literal=mcp-api-key='your-mcp-key' \
  --from-literal=anthropic-api-key='your-claude-key' \
  -n datacenter-docs

# 3. Create configmap
kubectl create configmap datacenter-config \
  --from-literal=mcp-server-url='https://mcp.company.local' \
  -n datacenter-docs

# 4. Deploy services
kubectl apply -f deploy/kubernetes/deployment.yaml
kubectl apply -f deploy/kubernetes/service.yaml
kubectl apply -f deploy/kubernetes/ingress.yaml

# 5. Check deployment
kubectl get pods -n datacenter-docs
kubectl logs -n datacenter-docs deployment/api

Option 3: GitLab CI/CD (Automated)

# 1. Push to GitLab
git push origin main

# 2. Pipeline runs automatically:
#    - Lint & Test
#    - Build Docker images
#    - Deploy to staging (manual approval)
#    - Deploy to production (manual, on tags)

# 3. Monitor pipeline
# Visit: https://gitlab.company.local/infrastructure/datacenter-docs/-/pipelines

Option 4: Gitea Actions (Automated)

# 1. Push to Gitea
git push origin main

# 2. Workflow triggers:
#    - On push: Build & deploy to staging
#    - On tag: Deploy to production
#    - On schedule: Generate docs every 6h

# 3. Monitor workflow
# Visit: https://gitea.company.local/infrastructure/datacenter-docs/actions

Configuration Details

Environment Variables (.env)

# Database
DATABASE_URL=postgresql://docs_user:CHANGE_ME@postgres:5432/datacenter_docs

# Redis
REDIS_URL=redis://:CHANGE_ME@redis:6379/0

# MCP Server (CRITICAL - Required for device connectivity)
MCP_SERVER_URL=https://mcp.company.local
MCP_API_KEY=your_mcp_api_key_here

# Anthropic Claude API (CRITICAL - Required for AI)
ANTHROPIC_API_KEY=sk-ant-api03-xxxxx

# CORS (Adjust for your domain)
CORS_ORIGINS=http://localhost:3000,https://docs.company.local

# Optional
LOG_LEVEL=INFO
DEBUG=false
WORKERS=4
MAX_TOKENS=4096

Kubernetes Secrets (secrets.yaml)

apiVersion: v1
kind: Secret
metadata:
  name: datacenter-secrets
  namespace: datacenter-docs
type: Opaque
stringData:
  database-url: "postgresql://user:pass@postgresql.default:5432/datacenter_docs"
  redis-url: "redis://:pass@redis.default:6379/0"
  mcp-api-key: "your-mcp-key"
  anthropic-api-key: "sk-ant-api03-xxxxx"

Post-Deployment Steps

1. Database Migrations

# Docker Compose
docker-compose exec api poetry run alembic upgrade head

# Kubernetes
kubectl exec -n datacenter-docs deployment/api -- \
  poetry run alembic upgrade head

2. Index Initial Documentation

# Docker Compose
docker-compose exec api poetry run datacenter-docs index-docs \
  --path /app/output

# Kubernetes
kubectl exec -n datacenter-docs deployment/api -- \
  poetry run datacenter-docs index-docs --path /app/output

3. Generate Documentation

# Manual trigger
curl -X POST http://localhost:8000/api/v1/documentation/generate/infrastructure

# Or run full generation
docker-compose exec worker poetry run datacenter-docs generate-all

4. Test API

# Health check
curl http://localhost:8000/health

# Create test ticket
curl -X POST http://localhost:8000/api/v1/tickets \
  -H "Content-Type: application/json" \
  -d '{
    "ticket_id": "TEST-001",
    "title": "Test ticket",
    "description": "Testing auto-resolution",
    "category": "network"
  }'

# Get ticket status
curl http://localhost:8000/api/v1/tickets/TEST-001

# Search documentation
curl -X POST http://localhost:8000/api/v1/documentation/search \
  -H "Content-Type: application/json" \
  -d '{"query": "UPS battery status", "limit": 5}'

Monitoring

Prometheus Metrics

# Metrics endpoint
curl http://localhost:8000/metrics

# Example metrics:
# datacenter_docs_tickets_total
# datacenter_docs_tickets_resolved_total
# datacenter_docs_resolution_confidence_score
# datacenter_docs_processing_time_seconds

Grafana Dashboards

Import dashboard from: deploy/grafana/dashboard.json

Logs

# Docker Compose
docker-compose logs -f api chat worker

# Kubernetes
kubectl logs -n datacenter-docs deployment/api -f
kubectl logs -n datacenter-docs deployment/chat -f
kubectl logs -n datacenter-docs deployment/worker -f

Celery Flower (Task Monitoring)

Access: http://localhost:5555 (Docker Compose) or https://docs.company.local/flower (K8s)


Scaling

Horizontal Scaling

# Docker Compose (increase replicas in docker-compose.yml)
docker-compose up -d --scale worker=5

# Kubernetes
kubectl scale deployment api --replicas=5 -n datacenter-docs
kubectl scale deployment worker --replicas=10 -n datacenter-docs

Vertical Scaling

Edit resource limits in deploy/kubernetes/deployment.yaml:

resources:
  requests:
    memory: "1Gi"
    cpu: "500m"
  limits:
    memory: "2Gi"
    cpu: "2000m"

Troubleshooting

API not starting

# Check logs
docker-compose logs api

# Common issues:
# - Database not accessible
# - Missing environment variables
# - MCP server not reachable

# Test database connection
docker-compose exec api python -c "
from datacenter_docs.utils.database import get_db
next(get_db())
print('DB OK')
"

Chat not connecting

# Check WebSocket connection
# Browser console should show: WebSocket connection established

# Test from curl
curl -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
  http://localhost:8001/socket.io/

Worker not processing jobs

# Check Celery status
docker-compose exec worker celery -A datacenter_docs.workers.celery_app status

# Check Redis connection
docker-compose exec worker python -c "
import redis
r = redis.from_url('redis://:pass@redis:6379/0')
print(r.ping())
"

MCP Connection Issues

# Test MCP connectivity
docker-compose exec api python -c "
import asyncio
from datacenter_docs.mcp.client import MCPClient

async def test():
    async with MCPClient(
        server_url='https://mcp.company.local',
        api_key='your-key'
    ) as client:
        resources = await client.list_resources()
        print(f'Found {len(resources)} resources')

asyncio.run(test())
"

Backup & Recovery

Database Backup

# Docker Compose
docker-compose exec postgres pg_dump -U docs_user datacenter_docs > backup.sql

# Kubernetes
kubectl exec -n datacenter-docs postgresql-0 -- \
  pg_dump -U docs_user datacenter_docs > backup.sql

Documentation Backup

# Backup generated docs
tar -czf docs-backup-$(date +%Y%m%d).tar.gz output/

# Backup vector store
tar -czf vectordb-backup-$(date +%Y%m%d).tar.gz data/chroma_db/

Restore

# Database
docker-compose exec -T postgres psql -U docs_user datacenter_docs < backup.sql

# Documentation
tar -xzf docs-backup-20250115.tar.gz
tar -xzf vectordb-backup-20250115.tar.gz

Security Checklist

  • All secrets stored in vault/secrets manager
  • TLS enabled for all services
  • API rate limiting configured
  • CORS properly configured
  • Network policies applied (K8s)
  • Regular security scans scheduled
  • Audit logging enabled
  • Backup encryption enabled

Performance Tuning

API Optimization

# Increase workers (in .env)
WORKERS=8  # 2x CPU cores

# Adjust max tokens
MAX_TOKENS=8192  # Higher for complex queries

Database Optimization

-- Add indexes
CREATE INDEX idx_tickets_status ON tickets(status);
CREATE INDEX idx_tickets_created_at ON tickets(created_at);

Redis Caching

# Adjust cache TTL (in code)
CACHE_TTL = {
    'documentation': 3600,  # 1 hour
    'metrics': 300,  # 5 minutes
    'tickets': 60  # 1 minute
}

Maintenance

Regular Tasks

# Weekly
- Review and clean old logs
- Check disk usage
- Review failed tickets
- Update dependencies

# Monthly
- Database vacuum/optimize
- Security patches
- Performance review
- Backup verification

Scheduled Maintenance

# Schedule in crontab
0 2 * * 0 /opt/scripts/weekly-maintenance.sh
0 3 1 * * /opt/scripts/monthly-maintenance.sh

For support: automation-team@company.local