# Datacenter Docs & Remediation Engine - Helm Chart Helm chart for deploying the LLM Automation - Docs & Remediation Engine on Kubernetes. ## Overview This chart deploys a complete stack including: - **MongoDB**: Document database for storing tickets, documentation, and metadata - **Redis**: Cache and task queue backend - **API Service**: FastAPI REST API with auto-remediation capabilities - **Chat Service**: WebSocket server for real-time documentation queries (optional, not yet implemented) - **Worker Service**: Celery workers for background tasks (optional, not yet implemented) - **Frontend**: React-based web interface ## Prerequisites - Kubernetes 1.19+ - Helm 3.0+ - PersistentVolume provisioner support in the underlying infrastructure (for MongoDB persistence) - Ingress controller (optional, for external access) ## Installation ### Quick Start ```bash # Add the chart repository (if published) helm repo add datacenter-docs https://your-repo-url helm repo update # Install with default values helm install my-datacenter-docs datacenter-docs/datacenter-docs # Or install from local directory helm install my-datacenter-docs ./datacenter-docs ``` ### Production Installation For production, create a custom `values.yaml`: ```bash # Copy and edit the values file cp values.yaml my-values.yaml # Edit my-values.yaml with your configuration # At minimum, change: # - secrets.llmApiKey # - secrets.apiSecretKey # - ingress.hosts # Install with custom values helm install my-datacenter-docs ./datacenter-docs -f my-values.yaml ``` ### Install with Specific Configuration ```bash helm install my-datacenter-docs ./datacenter-docs \ --set secrets.llmApiKey="sk-your-openai-api-key" \ --set secrets.apiSecretKey="your-strong-secret-key" \ --set ingress.hosts[0].host="datacenter-docs.yourdomain.com" \ --set mongodb.persistence.size="50Gi" ``` ## Configuration ### Key Configuration Parameters #### Global Settings | Parameter | Description | Default | |-----------|-------------|---------| | `global.imagePullPolicy` | Image pull policy | `IfNotPresent` | | `global.storageClass` | Storage class for PVCs | `""` | #### MongoDB | Parameter | Description | Default | |-----------|-------------|---------| | `mongodb.enabled` | Enable MongoDB | `true` | | `mongodb.image.repository` | MongoDB image | `mongo` | | `mongodb.image.tag` | MongoDB version | `7` | | `mongodb.auth.rootUsername` | Root username | `admin` | | `mongodb.auth.rootPassword` | Root password | `admin123` | | `mongodb.persistence.enabled` | Enable persistence | `true` | | `mongodb.persistence.size` | Volume size | `10Gi` | | `mongodb.resources.requests.memory` | Memory request | `512Mi` | | `mongodb.resources.limits.memory` | Memory limit | `2Gi` | #### Redis | Parameter | Description | Default | |-----------|-------------|---------| | `redis.enabled` | Enable Redis | `true` | | `redis.image.repository` | Redis image | `redis` | | `redis.image.tag` | Redis version | `7-alpine` | | `redis.resources.requests.memory` | Memory request | `128Mi` | | `redis.resources.limits.memory` | Memory limit | `512Mi` | #### API Service | Parameter | Description | Default | |-----------|-------------|---------| | `api.enabled` | Enable API service | `true` | | `api.replicaCount` | Number of replicas | `2` | | `api.image.repository` | API image repository | `datacenter-docs-api` | | `api.image.tag` | API image tag | `latest` | | `api.service.port` | Service port | `8000` | | `api.autoscaling.enabled` | Enable HPA | `true` | | `api.autoscaling.minReplicas` | Min replicas | `2` | | `api.autoscaling.maxReplicas` | Max replicas | `10` | | `api.resources.requests.memory` | Memory request | `512Mi` | | `api.resources.limits.memory` | Memory limit | `2Gi` | #### Worker Service | Parameter | Description | Default | |-----------|-------------|---------| | `worker.enabled` | Enable worker service | `false` | | `worker.replicaCount` | Number of replicas | `3` | | `worker.autoscaling.enabled` | Enable HPA | `true` | | `worker.autoscaling.minReplicas` | Min replicas | `1` | | `worker.autoscaling.maxReplicas` | Max replicas | `10` | #### Chat Service | Parameter | Description | Default | |-----------|-------------|---------| | `chat.enabled` | Enable chat service | `false` | | `chat.replicaCount` | Number of replicas | `1` | | `chat.service.port` | Service port | `8001` | #### Frontend | Parameter | Description | Default | |-----------|-------------|---------| | `frontend.enabled` | Enable frontend | `true` | | `frontend.replicaCount` | Number of replicas | `2` | | `frontend.service.port` | Service port | `80` | #### Ingress | Parameter | Description | Default | |-----------|-------------|---------| | `ingress.enabled` | Enable ingress | `true` | | `ingress.className` | Ingress class | `nginx` | | `ingress.hosts[0].host` | Hostname | `datacenter-docs.example.com` | | `ingress.tls[0].secretName` | TLS secret name | `datacenter-docs-tls` | #### Application Configuration | Parameter | Description | Default | |-----------|-------------|---------| | `config.llm.baseUrl` | LLM provider URL | `https://api.openai.com/v1` | | `config.llm.model` | LLM model | `gpt-4-turbo-preview` | | `config.autoRemediation.enabled` | Enable auto-remediation | `true` | | `config.autoRemediation.minReliabilityScore` | Min reliability score | `85.0` | | `config.autoRemediation.dryRun` | Dry run mode | `false` | | `config.logLevel` | Log level | `INFO` | #### Secrets | Parameter | Description | Default | |-----------|-------------|---------| | `secrets.llmApiKey` | LLM API key | `sk-your-openai-api-key-here` | | `secrets.apiSecretKey` | API secret key | `your-secret-key-here-change-in-production` | **IMPORTANT**: Change these secrets in production! ## Usage Examples ### Enable All Services (including chat and worker) ```bash helm install my-datacenter-docs ./datacenter-docs \ --set chat.enabled=true \ --set worker.enabled=true ``` ### Disable Auto-Remediation ```bash helm install my-datacenter-docs ./datacenter-docs \ --set config.autoRemediation.enabled=false ``` ### Use Different LLM Provider (e.g., Anthropic Claude) ```bash helm install my-datacenter-docs ./datacenter-docs \ --set config.llm.baseUrl="https://api.anthropic.com/v1" \ --set config.llm.model="claude-3-opus-20240229" \ --set secrets.llmApiKey="sk-ant-your-anthropic-key" ``` ### Use Local LLM (e.g., Ollama) ```bash helm install my-datacenter-docs ./datacenter-docs \ --set config.llm.baseUrl="http://ollama-service:11434/v1" \ --set config.llm.model="llama2" \ --set secrets.llmApiKey="not-needed" ``` ### Scale MongoDB Storage ```bash helm install my-datacenter-docs ./datacenter-docs \ --set mongodb.persistence.size="100Gi" ``` ### Disable Ingress (use port-forward instead) ```bash helm install my-datacenter-docs ./datacenter-docs \ --set ingress.enabled=false ``` ### Production Configuration with External MongoDB ```yaml # production-values.yaml mongodb: enabled: false config: mongodbUrl: "mongodb://user:pass@external-mongodb:27017/datacenter_docs?authSource=admin" api: replicaCount: 5 autoscaling: maxReplicas: 20 secrets: llmApiKey: "sk-your-production-api-key" apiSecretKey: "your-production-secret-key" ingress: hosts: - host: "datacenter-docs.prod.yourdomain.com" paths: - path: / pathType: Prefix service: frontend - path: /api pathType: Prefix service: api ``` ```bash helm install prod-datacenter-docs ./datacenter-docs -f production-values.yaml ``` ## Upgrading ```bash # Upgrade with new values helm upgrade my-datacenter-docs ./datacenter-docs -f my-values.yaml # Upgrade specific parameters helm upgrade my-datacenter-docs ./datacenter-docs \ --set api.image.tag="v1.2.0" \ --reuse-values ``` ## Uninstallation ```bash helm uninstall my-datacenter-docs ``` **Note**: This will delete all resources except PersistentVolumeClaims (PVCs) for MongoDB. To also delete PVCs: ```bash kubectl delete pvc -l app.kubernetes.io/instance=my-datacenter-docs ``` ## Monitoring and Troubleshooting ### Check Pod Status ```bash kubectl get pods -l app.kubernetes.io/instance=my-datacenter-docs ``` ### View Logs ```bash # API logs kubectl logs -l app.kubernetes.io/component=api -f # Worker logs kubectl logs -l app.kubernetes.io/component=worker -f # MongoDB logs kubectl logs -l app.kubernetes.io/component=database -f ``` ### Access Services Locally ```bash # API kubectl port-forward svc/my-datacenter-docs-api 8000:8000 # Frontend kubectl port-forward svc/my-datacenter-docs-frontend 8080:80 # MongoDB (for debugging) kubectl port-forward svc/my-datacenter-docs-mongodb 27017:27017 ``` ### Common Issues #### Pods Stuck in Pending Check if PVCs are bound: ```bash kubectl get pvc ``` If storage class is missing, set it: ```bash helm upgrade my-datacenter-docs ./datacenter-docs \ --set mongodb.persistence.storageClass="standard" \ --reuse-values ``` #### API Pods Crash Loop Check logs: ```bash kubectl logs -l app.kubernetes.io/component=api --tail=100 ``` Common causes: - MongoDB not ready (wait for init containers) - Invalid LLM API key - Missing environment variables #### Cannot Access via Ingress Check ingress status: ```bash kubectl get ingress kubectl describe ingress my-datacenter-docs ``` Ensure: - Ingress controller is installed - DNS points to ingress IP - TLS certificate is valid (if using HTTPS) ## Security Considerations ### Production Checklist - [ ] Change `secrets.llmApiKey` to a valid API key - [ ] Change `secrets.apiSecretKey` to a strong random key - [ ] Change MongoDB credentials (`mongodb.auth.rootPassword`) - [ ] Enable TLS/SSL on ingress - [ ] Review RBAC policies - [ ] Use external secret management (e.g., HashiCorp Vault, AWS Secrets Manager) - [ ] Enable network policies - [ ] Set resource limits on all pods - [ ] Enable pod security policies - [ ] Review auto-remediation settings ### Using External Secrets Instead of storing secrets in values.yaml, use Kubernetes secrets: ```bash # Create secret kubectl create secret generic datacenter-docs-secrets \ --from-literal=llm-api-key="sk-your-key" \ --from-literal=api-secret-key="your-secret" # Modify templates to use existing secret # (requires chart customization) ``` ## Development ### Validating the Chart ```bash # Lint the chart helm lint ./datacenter-docs # Dry run helm install my-test ./datacenter-docs --dry-run --debug # Template rendering helm template my-test ./datacenter-docs > rendered.yaml ``` ### Testing Locally ```bash # Create kind cluster kind create cluster # Install chart helm install test ./datacenter-docs \ --set ingress.enabled=false \ --set api.autoscaling.enabled=false \ --set mongodb.persistence.enabled=false # Test kubectl port-forward svc/test-datacenter-docs-api 8000:8000 curl http://localhost:8000/health ``` ## Support For issues and questions: - Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues - Documentation: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine ## License See the main repository for license information.