Initial commit: LLM Automation Docs & Remediation Engine v2.0
Features: - Automated datacenter documentation generation - MCP integration for device connectivity - Auto-remediation engine with safety checks - Multi-factor reliability scoring (0-100%) - Human feedback learning loop - Pattern recognition and continuous improvement - Agentic chat support with AI - API for ticket resolution - Frontend React with Material-UI - CI/CD pipelines (GitLab + Gitea) - Docker & Kubernetes deployment - Complete documentation and guides v2.0 Highlights: - Auto-remediation with write operations (disabled by default) - Reliability calculator with 4-factor scoring - Human feedback system for continuous learning - Pattern-based progressive automation - Approval workflow for critical actions - Full audit trail and rollback capability
This commit is contained in:
443
DEPLOYMENT_GUIDE.md
Normal file
443
DEPLOYMENT_GUIDE.md
Normal file
@@ -0,0 +1,443 @@
|
||||
# 🚀 Deployment Guide - Datacenter Documentation System
|
||||
|
||||
## Quick Deploy Options
|
||||
|
||||
### Option 1: Docker Compose (Recommended for Development/Small Scale)
|
||||
|
||||
```bash
|
||||
# 1. Clone repository
|
||||
git clone https://git.company.local/infrastructure/datacenter-docs.git
|
||||
cd datacenter-docs
|
||||
|
||||
# 2. Configure environment
|
||||
cp .env.example .env
|
||||
nano .env # Edit with your credentials
|
||||
|
||||
# 3. Start all services
|
||||
docker-compose up -d
|
||||
|
||||
# 4. Check health
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# 5. Access services
|
||||
# API: http://localhost:8000/api/docs
|
||||
# Chat: http://localhost:8001
|
||||
# Frontend: http://localhost
|
||||
# Flower: http://localhost:5555
|
||||
```
|
||||
|
||||
### Option 2: Kubernetes (Production)
|
||||
|
||||
```bash
|
||||
# 1. Create namespace
|
||||
kubectl apply -f deploy/kubernetes/namespace.yaml
|
||||
|
||||
# 2. Create secrets
|
||||
kubectl create secret generic datacenter-secrets \
|
||||
--from-literal=database-url='postgresql://user:pass@host:5432/db' \
|
||||
--from-literal=redis-url='redis://:pass@host:6379/0' \
|
||||
--from-literal=mcp-api-key='your-mcp-key' \
|
||||
--from-literal=anthropic-api-key='your-claude-key' \
|
||||
-n datacenter-docs
|
||||
|
||||
# 3. Create configmap
|
||||
kubectl create configmap datacenter-config \
|
||||
--from-literal=mcp-server-url='https://mcp.company.local' \
|
||||
-n datacenter-docs
|
||||
|
||||
# 4. Deploy services
|
||||
kubectl apply -f deploy/kubernetes/deployment.yaml
|
||||
kubectl apply -f deploy/kubernetes/service.yaml
|
||||
kubectl apply -f deploy/kubernetes/ingress.yaml
|
||||
|
||||
# 5. Check deployment
|
||||
kubectl get pods -n datacenter-docs
|
||||
kubectl logs -n datacenter-docs deployment/api
|
||||
```
|
||||
|
||||
### Option 3: GitLab CI/CD (Automated)
|
||||
|
||||
```bash
|
||||
# 1. Push to GitLab
|
||||
git push origin main
|
||||
|
||||
# 2. Pipeline runs automatically:
|
||||
# - Lint & Test
|
||||
# - Build Docker images
|
||||
# - Deploy to staging (manual approval)
|
||||
# - Deploy to production (manual, on tags)
|
||||
|
||||
# 3. Monitor pipeline
|
||||
# Visit: https://gitlab.company.local/infrastructure/datacenter-docs/-/pipelines
|
||||
```
|
||||
|
||||
### Option 4: Gitea Actions (Automated)
|
||||
|
||||
```bash
|
||||
# 1. Push to Gitea
|
||||
git push origin main
|
||||
|
||||
# 2. Workflow triggers:
|
||||
# - On push: Build & deploy to staging
|
||||
# - On tag: Deploy to production
|
||||
# - On schedule: Generate docs every 6h
|
||||
|
||||
# 3. Monitor workflow
|
||||
# Visit: https://gitea.company.local/infrastructure/datacenter-docs/actions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Details
|
||||
|
||||
### Environment Variables (.env)
|
||||
|
||||
```bash
|
||||
# Database
|
||||
DATABASE_URL=postgresql://docs_user:CHANGE_ME@postgres:5432/datacenter_docs
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://:CHANGE_ME@redis:6379/0
|
||||
|
||||
# MCP Server (CRITICAL - Required for device connectivity)
|
||||
MCP_SERVER_URL=https://mcp.company.local
|
||||
MCP_API_KEY=your_mcp_api_key_here
|
||||
|
||||
# Anthropic Claude API (CRITICAL - Required for AI)
|
||||
ANTHROPIC_API_KEY=sk-ant-api03-xxxxx
|
||||
|
||||
# CORS (Adjust for your domain)
|
||||
CORS_ORIGINS=http://localhost:3000,https://docs.company.local
|
||||
|
||||
# Optional
|
||||
LOG_LEVEL=INFO
|
||||
DEBUG=false
|
||||
WORKERS=4
|
||||
MAX_TOKENS=4096
|
||||
```
|
||||
|
||||
### Kubernetes Secrets (secrets.yaml)
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: datacenter-secrets
|
||||
namespace: datacenter-docs
|
||||
type: Opaque
|
||||
stringData:
|
||||
database-url: "postgresql://user:pass@postgresql.default:5432/datacenter_docs"
|
||||
redis-url: "redis://:pass@redis.default:6379/0"
|
||||
mcp-api-key: "your-mcp-key"
|
||||
anthropic-api-key: "sk-ant-api03-xxxxx"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment Steps
|
||||
|
||||
### 1. Database Migrations
|
||||
|
||||
```bash
|
||||
# Docker Compose
|
||||
docker-compose exec api poetry run alembic upgrade head
|
||||
|
||||
# Kubernetes
|
||||
kubectl exec -n datacenter-docs deployment/api -- \
|
||||
poetry run alembic upgrade head
|
||||
```
|
||||
|
||||
### 2. Index Initial Documentation
|
||||
|
||||
```bash
|
||||
# Docker Compose
|
||||
docker-compose exec api poetry run datacenter-docs index-docs \
|
||||
--path /app/output
|
||||
|
||||
# Kubernetes
|
||||
kubectl exec -n datacenter-docs deployment/api -- \
|
||||
poetry run datacenter-docs index-docs --path /app/output
|
||||
```
|
||||
|
||||
### 3. Generate Documentation
|
||||
|
||||
```bash
|
||||
# Manual trigger
|
||||
curl -X POST http://localhost:8000/api/v1/documentation/generate/infrastructure
|
||||
|
||||
# Or run full generation
|
||||
docker-compose exec worker poetry run datacenter-docs generate-all
|
||||
```
|
||||
|
||||
### 4. Test API
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# Create test ticket
|
||||
curl -X POST http://localhost:8000/api/v1/tickets \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"ticket_id": "TEST-001",
|
||||
"title": "Test ticket",
|
||||
"description": "Testing auto-resolution",
|
||||
"category": "network"
|
||||
}'
|
||||
|
||||
# Get ticket status
|
||||
curl http://localhost:8000/api/v1/tickets/TEST-001
|
||||
|
||||
# Search documentation
|
||||
curl -X POST http://localhost:8000/api/v1/documentation/search \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"query": "UPS battery status", "limit": 5}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
```bash
|
||||
# Metrics endpoint
|
||||
curl http://localhost:8000/metrics
|
||||
|
||||
# Example metrics:
|
||||
# datacenter_docs_tickets_total
|
||||
# datacenter_docs_tickets_resolved_total
|
||||
# datacenter_docs_resolution_confidence_score
|
||||
# datacenter_docs_processing_time_seconds
|
||||
```
|
||||
|
||||
### Grafana Dashboards
|
||||
|
||||
Import dashboard from: `deploy/grafana/dashboard.json`
|
||||
|
||||
### Logs
|
||||
|
||||
```bash
|
||||
# Docker Compose
|
||||
docker-compose logs -f api chat worker
|
||||
|
||||
# Kubernetes
|
||||
kubectl logs -n datacenter-docs deployment/api -f
|
||||
kubectl logs -n datacenter-docs deployment/chat -f
|
||||
kubectl logs -n datacenter-docs deployment/worker -f
|
||||
```
|
||||
|
||||
### Celery Flower (Task Monitoring)
|
||||
|
||||
Access: http://localhost:5555 (Docker Compose) or https://docs.company.local/flower (K8s)
|
||||
|
||||
---
|
||||
|
||||
## Scaling
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
```bash
|
||||
# Docker Compose (increase replicas in docker-compose.yml)
|
||||
docker-compose up -d --scale worker=5
|
||||
|
||||
# Kubernetes
|
||||
kubectl scale deployment api --replicas=5 -n datacenter-docs
|
||||
kubectl scale deployment worker --replicas=10 -n datacenter-docs
|
||||
```
|
||||
|
||||
### Vertical Scaling
|
||||
|
||||
Edit resource limits in `deploy/kubernetes/deployment.yaml`:
|
||||
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
memory: "1Gi"
|
||||
cpu: "500m"
|
||||
limits:
|
||||
memory: "2Gi"
|
||||
cpu: "2000m"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### API not starting
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
docker-compose logs api
|
||||
|
||||
# Common issues:
|
||||
# - Database not accessible
|
||||
# - Missing environment variables
|
||||
# - MCP server not reachable
|
||||
|
||||
# Test database connection
|
||||
docker-compose exec api python -c "
|
||||
from datacenter_docs.utils.database import get_db
|
||||
next(get_db())
|
||||
print('DB OK')
|
||||
"
|
||||
```
|
||||
|
||||
### Chat not connecting
|
||||
|
||||
```bash
|
||||
# Check WebSocket connection
|
||||
# Browser console should show: WebSocket connection established
|
||||
|
||||
# Test from curl
|
||||
curl -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
|
||||
http://localhost:8001/socket.io/
|
||||
```
|
||||
|
||||
### Worker not processing jobs
|
||||
|
||||
```bash
|
||||
# Check Celery status
|
||||
docker-compose exec worker celery -A datacenter_docs.workers.celery_app status
|
||||
|
||||
# Check Redis connection
|
||||
docker-compose exec worker python -c "
|
||||
import redis
|
||||
r = redis.from_url('redis://:pass@redis:6379/0')
|
||||
print(r.ping())
|
||||
"
|
||||
```
|
||||
|
||||
### MCP Connection Issues
|
||||
|
||||
```bash
|
||||
# Test MCP connectivity
|
||||
docker-compose exec api python -c "
|
||||
import asyncio
|
||||
from datacenter_docs.mcp.client import MCPClient
|
||||
|
||||
async def test():
|
||||
async with MCPClient(
|
||||
server_url='https://mcp.company.local',
|
||||
api_key='your-key'
|
||||
) as client:
|
||||
resources = await client.list_resources()
|
||||
print(f'Found {len(resources)} resources')
|
||||
|
||||
asyncio.run(test())
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backup & Recovery
|
||||
|
||||
### Database Backup
|
||||
|
||||
```bash
|
||||
# Docker Compose
|
||||
docker-compose exec postgres pg_dump -U docs_user datacenter_docs > backup.sql
|
||||
|
||||
# Kubernetes
|
||||
kubectl exec -n datacenter-docs postgresql-0 -- \
|
||||
pg_dump -U docs_user datacenter_docs > backup.sql
|
||||
```
|
||||
|
||||
### Documentation Backup
|
||||
|
||||
```bash
|
||||
# Backup generated docs
|
||||
tar -czf docs-backup-$(date +%Y%m%d).tar.gz output/
|
||||
|
||||
# Backup vector store
|
||||
tar -czf vectordb-backup-$(date +%Y%m%d).tar.gz data/chroma_db/
|
||||
```
|
||||
|
||||
### Restore
|
||||
|
||||
```bash
|
||||
# Database
|
||||
docker-compose exec -T postgres psql -U docs_user datacenter_docs < backup.sql
|
||||
|
||||
# Documentation
|
||||
tar -xzf docs-backup-20250115.tar.gz
|
||||
tar -xzf vectordb-backup-20250115.tar.gz
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Checklist
|
||||
|
||||
- [ ] All secrets stored in vault/secrets manager
|
||||
- [ ] TLS enabled for all services
|
||||
- [ ] API rate limiting configured
|
||||
- [ ] CORS properly configured
|
||||
- [ ] Network policies applied (K8s)
|
||||
- [ ] Regular security scans scheduled
|
||||
- [ ] Audit logging enabled
|
||||
- [ ] Backup encryption enabled
|
||||
|
||||
---
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### API Optimization
|
||||
|
||||
```python
|
||||
# Increase workers (in .env)
|
||||
WORKERS=8 # 2x CPU cores
|
||||
|
||||
# Adjust max tokens
|
||||
MAX_TOKENS=8192 # Higher for complex queries
|
||||
```
|
||||
|
||||
### Database Optimization
|
||||
|
||||
```sql
|
||||
-- Add indexes
|
||||
CREATE INDEX idx_tickets_status ON tickets(status);
|
||||
CREATE INDEX idx_tickets_created_at ON tickets(created_at);
|
||||
```
|
||||
|
||||
### Redis Caching
|
||||
|
||||
```python
|
||||
# Adjust cache TTL (in code)
|
||||
CACHE_TTL = {
|
||||
'documentation': 3600, # 1 hour
|
||||
'metrics': 300, # 5 minutes
|
||||
'tickets': 60 # 1 minute
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Regular Tasks
|
||||
|
||||
```bash
|
||||
# Weekly
|
||||
- Review and clean old logs
|
||||
- Check disk usage
|
||||
- Review failed tickets
|
||||
- Update dependencies
|
||||
|
||||
# Monthly
|
||||
- Database vacuum/optimize
|
||||
- Security patches
|
||||
- Performance review
|
||||
- Backup verification
|
||||
```
|
||||
|
||||
### Scheduled Maintenance
|
||||
|
||||
```bash
|
||||
# Schedule in crontab
|
||||
0 2 * * 0 /opt/scripts/weekly-maintenance.sh
|
||||
0 3 1 * * /opt/scripts/monthly-maintenance.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**For support**: automation-team@company.local
|
||||
Reference in New Issue
Block a user