9.6 KiB
Helm Deployment
This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.
Contents
datacenter-docs/- Main Helm chart for the applicationtest-chart.sh- Automated testing script for chart validation
Quick Start
Prerequisites
- Kubernetes cluster (1.19+)
- Helm 3.0+
- kubectl configured to access your cluster
Development/Testing Installation
# Install with development settings (minimal resources, local testing)
helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml
# Access the application
kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80
# View API docs: http://localhost:8000/api/docs
# View frontend: http://localhost:8080
Production Installation
# Copy and customize production values
cp datacenter-docs/values-production.yaml my-production-values.yaml
# Edit my-production-values.yaml:
# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
# - Update ingress hosts
# - Adjust resource limits
# - Configure LLM provider
# - Review auto-remediation settings
# Install
helm install prod ./datacenter-docs -f my-production-values.yaml
# Verify deployment
helm list
kubectl get pods
kubectl get ingress
Chart Structure
datacenter-docs/
├── Chart.yaml # Chart metadata
├── values.yaml # Default configuration
├── values-development.yaml # Development settings
├── values-production.yaml # Production example
├── README.md # Detailed chart documentation
├── .helmignore # Files to exclude from package
└── templates/
├── NOTES.txt # Post-install instructions
├── _helpers.tpl # Template helpers
├── configmap.yaml # Application configuration
├── secrets.yaml # Sensitive data
├── serviceaccount.yaml # Service account
├── mongodb-statefulset.yaml # MongoDB StatefulSet
├── mongodb-service.yaml # MongoDB Service
├── redis-deployment.yaml # Redis Deployment
├── redis-service.yaml # Redis Service
├── api-deployment.yaml # API Deployment
├── api-service.yaml # API Service
├── api-hpa.yaml # API autoscaling
├── chat-deployment.yaml # Chat Deployment
├── chat-service.yaml # Chat Service
├── worker-deployment.yaml # Worker Deployment
├── worker-hpa.yaml # Worker autoscaling
├── frontend-deployment.yaml # Frontend Deployment
├── frontend-service.yaml # Frontend Service
└── ingress.yaml # Ingress configuration
Testing the Chart
Run the automated test script:
cd deploy/helm
./test-chart.sh
This will:
- Lint the chart
- Render templates with different value files
- Perform dry-run installation
- Validate Kubernetes manifests
- Package the chart
Common Operations
Upgrade Release
# Upgrade with new values
helm upgrade prod ./datacenter-docs -f my-production-values.yaml
# Upgrade with specific parameter changes
helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values
Check Status
# List releases
helm list
# Get release status
helm status prod
# Get current values
helm get values prod
# Get all manifests
helm get manifest prod
Rollback
# View revision history
helm history prod
# Rollback to previous version
helm rollback prod
# Rollback to specific revision
helm rollback prod 2
Uninstall
# Uninstall release
helm uninstall prod
# Also delete PVCs (if using persistent storage)
kubectl delete pvc -l app.kubernetes.io/instance=prod
Configuration Files
values.yaml
Default configuration with reasonable settings for development/testing.
values-development.yaml
Optimized for local development:
- Minimal resource requests/limits
- Single replicas
- Persistence disabled
- Dry-run mode for auto-remediation
- Debug logging
- Ingress disabled (use port-forward)
values-production.yaml
Example production configuration:
- Higher resource limits
- Multiple replicas
- Autoscaling enabled
- Persistence enabled with larger volumes
- TLS/SSL enabled
- Production-grade security settings
- All components enabled
Important: Copy and customize this file for your environment. Never use default secrets!
Available Components
| Component | Purpose | Default Enabled |
|---|---|---|
| MongoDB | Document database | Yes |
| Redis | Cache & task queue | Yes |
| API | REST API service | Yes |
| Chat | WebSocket server | No (not implemented) |
| Worker | Celery background tasks | No (not implemented) |
| Frontend | Web UI | Yes |
Enable/disable components in your values file:
mongodb:
enabled: true
redis:
enabled: true
api:
enabled: true
chat:
enabled: false # Set to true when implemented
worker:
enabled: false # Set to true when implemented
frontend:
enabled: true
Architecture
The chart deploys a complete microservices architecture:
┌─────────────┐
│ Ingress │
└──────┬──────┘
│
┌─────────────┼─────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│Frontend │ │ API │ │ Chat │
└─────────┘ └────┬────┘ └────┬────┘
│ │
┌─────────────┼────────────┘
│ │
┌────▼────┐ ┌────▼────┐
│ Redis │ │ MongoDB │
└─────────┘ └─────────┘
▲
│
┌────┴────┐
│ Worker │
└─────────┘
LLM Provider Configuration
The chart supports multiple LLM providers. Configure in your values file:
OpenAI
config:
llm:
baseUrl: "https://api.openai.com/v1"
model: "gpt-4-turbo-preview"
secrets:
llmApiKey: "sk-your-openai-key"
Anthropic Claude
config:
llm:
baseUrl: "https://api.anthropic.com/v1"
model: "claude-3-opus-20240229"
secrets:
llmApiKey: "sk-ant-your-anthropic-key"
Local (Ollama)
config:
llm:
baseUrl: "http://ollama-service:11434/v1"
model: "llama2"
secrets:
llmApiKey: "not-needed"
Azure OpenAI
config:
llm:
baseUrl: "https://your-resource.openai.azure.com"
model: "gpt-4"
secrets:
llmApiKey: "your-azure-key"
Security Best Practices
For production deployments:
-
Change all default secrets
helm install prod ./datacenter-docs \ --set secrets.llmApiKey="your-actual-key" \ --set secrets.apiSecretKey="$(openssl rand -base64 32)" \ --set secrets.mongodbPassword="$(openssl rand -base64 32)" -
Use external secret management
- HashiCorp Vault
- AWS Secrets Manager
- Azure Key Vault
- Kubernetes External Secrets Operator
-
Enable TLS/SSL
ingress: annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" tls: - secretName: datacenter-docs-tls hosts: - datacenter-docs.yourdomain.com -
Review auto-remediation settings
config: autoRemediation: enabled: true minReliabilityScore: 95.0 # High threshold for production dryRun: true # Test first, then set to false -
Implement network policies
-
Enable resource quotas
-
Regular security scanning
Monitoring and Observability
The chart is designed to integrate with:
- Prometheus: Metrics collection
- Grafana: Visualization
- Jaeger: Distributed tracing
- ELK/Loki: Log aggregation
Add annotations to enable monitoring:
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
Troubleshooting
Pods not starting
# Check pod status
kubectl get pods -l app.kubernetes.io/instance=prod
# Describe pod for events
kubectl describe pod <pod-name>
# View logs
kubectl logs <pod-name> -f
Storage issues
# Check PVC status
kubectl get pvc
# Check storage class
kubectl get storageclass
# Manually create PVC if needed
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
EOF
Ingress not working
# Check ingress status
kubectl get ingress
kubectl describe ingress prod-datacenter-docs
# Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f
Support
For detailed documentation, see:
- Chart README:
datacenter-docs/README.md - Main project:
../../README.md - Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
License
See the main repository for license information.