Add Helm chart, Docs, and Config conversion script

2025-10-22 14:35:21 +02:00
parent ba9900bd57
commit 2719cfff59
31 changed files with 4436 additions and 0 deletions
--- a/deploy/helm/README.md
+++ b/deploy/helm/README.md
@@ -0,0 +1,400 @@
+# Helm Deployment
+
+This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.
+
+## Contents
+
+- `datacenter-docs/` - Main Helm chart for the application
+- `test-chart.sh` - Automated testing script for chart validation
+
+## Quick Start
+
+### Prerequisites
+
+- Kubernetes cluster (1.19+)
+- Helm 3.0+
+- kubectl configured to access your cluster
+
+### Development/Testing Installation
+
+```bash
+# Install with development settings (minimal resources, local testing)
+helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml
+
+# Access the application
+kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
+kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80
+
+# View API docs: http://localhost:8000/api/docs
+# View frontend: http://localhost:8080
+```
+
+### Production Installation
+
+```bash
+# Copy and customize production values
+cp datacenter-docs/values-production.yaml my-production-values.yaml
+
+# Edit my-production-values.yaml:
+# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
+# - Update ingress hosts
+# - Adjust resource limits
+# - Configure LLM provider
+# - Review auto-remediation settings
+
+# Install
+helm install prod ./datacenter-docs -f my-production-values.yaml
+
+# Verify deployment
+helm list
+kubectl get pods
+kubectl get ingress
+```
+
+## Chart Structure
+
+```
+datacenter-docs/
+├── Chart.yaml                      # Chart metadata
+├── values.yaml                     # Default configuration
+├── values-development.yaml         # Development settings
+├── values-production.yaml          # Production example
+├── README.md                       # Detailed chart documentation
+├── .helmignore                     # Files to exclude from package
+└── templates/
+    ├── NOTES.txt                   # Post-install instructions
+    ├── _helpers.tpl                # Template helpers
+    ├── configmap.yaml              # Application configuration
+    ├── secrets.yaml                # Sensitive data
+    ├── serviceaccount.yaml         # Service account
+    ├── mongodb-statefulset.yaml    # MongoDB StatefulSet
+    ├── mongodb-service.yaml        # MongoDB Service
+    ├── redis-deployment.yaml       # Redis Deployment
+    ├── redis-service.yaml          # Redis Service
+    ├── api-deployment.yaml         # API Deployment
+    ├── api-service.yaml            # API Service
+    ├── api-hpa.yaml                # API autoscaling
+    ├── chat-deployment.yaml        # Chat Deployment
+    ├── chat-service.yaml           # Chat Service
+    ├── worker-deployment.yaml      # Worker Deployment
+    ├── worker-hpa.yaml             # Worker autoscaling
+    ├── frontend-deployment.yaml    # Frontend Deployment
+    ├── frontend-service.yaml       # Frontend Service
+    └── ingress.yaml                # Ingress configuration
+```
+
+## Testing the Chart
+
+Run the automated test script:
+
+```bash
+cd deploy/helm
+./test-chart.sh
+```
+
+This will:
+1. Lint the chart
+2. Render templates with different value files
+3. Perform dry-run installation
+4. Validate Kubernetes manifests
+5. Package the chart
+
+## Common Operations
+
+### Upgrade Release
+
+```bash
+# Upgrade with new values
+helm upgrade prod ./datacenter-docs -f my-production-values.yaml
+
+# Upgrade with specific parameter changes
+helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values
+```
+
+### Check Status
+
+```bash
+# List releases
+helm list
+
+# Get release status
+helm status prod
+
+# Get current values
+helm get values prod
+
+# Get all manifests
+helm get manifest prod
+```
+
+### Rollback
+
+```bash
+# View revision history
+helm history prod
+
+# Rollback to previous version
+helm rollback prod
+
+# Rollback to specific revision
+helm rollback prod 2
+```
+
+### Uninstall
+
+```bash
+# Uninstall release
+helm uninstall prod
+
+# Also delete PVCs (if using persistent storage)
+kubectl delete pvc -l app.kubernetes.io/instance=prod
+```
+
+## Configuration Files
+
+### values.yaml
+Default configuration with reasonable settings for development/testing.
+
+### values-development.yaml
+Optimized for local development:
+- Minimal resource requests/limits
+- Single replicas
+- Persistence disabled
+- Dry-run mode for auto-remediation
+- Debug logging
+- Ingress disabled (use port-forward)
+
+### values-production.yaml
+Example production configuration:
+- Higher resource limits
+- Multiple replicas
+- Autoscaling enabled
+- Persistence enabled with larger volumes
+- TLS/SSL enabled
+- Production-grade security settings
+- All components enabled
+
+**Important**: Copy and customize this file for your environment. Never use default secrets!
+
+## Available Components
+
+| Component | Purpose | Default Enabled |
+|-----------|---------|-----------------|
+| MongoDB | Document database | Yes |
+| Redis | Cache & task queue | Yes |
+| API | REST API service | Yes |
+| Chat | WebSocket server | No (not implemented) |
+| Worker | Celery background tasks | No (not implemented) |
+| Frontend | Web UI | Yes |
+
+Enable/disable components in your values file:
+
+```yaml
+mongodb:
+  enabled: true
+redis:
+  enabled: true
+api:
+  enabled: true
+chat:
+  enabled: false  # Set to true when implemented
+worker:
+  enabled: false  # Set to true when implemented
+frontend:
+  enabled: true
+```
+
+## Architecture
+
+The chart deploys a complete microservices architecture:
+
+```
+                     ┌─────────────┐
+                     │   Ingress   │
+                     └──────┬──────┘
+                            │
+              ┌─────────────┼─────────────┐
+              │             │             │
+         ┌────▼────┐   ┌────▼────┐  ┌────▼────┐
+         │Frontend │   │   API   │  │  Chat   │
+         └─────────┘   └────┬────┘  └────┬────┘
+                            │            │
+              ┌─────────────┼────────────┘
+              │             │
+         ┌────▼────┐   ┌────▼────┐
+         │  Redis  │   │ MongoDB │
+         └─────────┘   └─────────┘
+              ▲
+              │
+         ┌────┴────┐
+         │ Worker  │
+         └─────────┘
+```
+
+## LLM Provider Configuration
+
+The chart supports multiple LLM providers. Configure in your values file:
+
+### OpenAI
+
+```yaml
+config:
+  llm:
+    baseUrl: "https://api.openai.com/v1"
+    model: "gpt-4-turbo-preview"
+secrets:
+  llmApiKey: "sk-your-openai-key"
+```
+
+### Anthropic Claude
+
+```yaml
+config:
+  llm:
+    baseUrl: "https://api.anthropic.com/v1"
+    model: "claude-3-opus-20240229"
+secrets:
+  llmApiKey: "sk-ant-your-anthropic-key"
+```
+
+### Local (Ollama)
+
+```yaml
+config:
+  llm:
+    baseUrl: "http://ollama-service:11434/v1"
+    model: "llama2"
+secrets:
+  llmApiKey: "not-needed"
+```
+
+### Azure OpenAI
+
+```yaml
+config:
+  llm:
+    baseUrl: "https://your-resource.openai.azure.com"
+    model: "gpt-4"
+secrets:
+  llmApiKey: "your-azure-key"
+```
+
+## Security Best Practices
+
+For production deployments:
+
+1. **Change all default secrets**
+   ```bash
+   helm install prod ./datacenter-docs \
+     --set secrets.llmApiKey="your-actual-key" \
+     --set secrets.apiSecretKey="$(openssl rand -base64 32)" \
+     --set secrets.mongodbPassword="$(openssl rand -base64 32)"
+   ```
+
+2. **Use external secret management**
+   - HashiCorp Vault
+   - AWS Secrets Manager
+   - Azure Key Vault
+   - Kubernetes External Secrets Operator
+
+3. **Enable TLS/SSL**
+   ```yaml
+   ingress:
+     annotations:
+       cert-manager.io/cluster-issuer: "letsencrypt-prod"
+     tls:
+       - secretName: datacenter-docs-tls
+         hosts:
+           - datacenter-docs.yourdomain.com
+   ```
+
+4. **Review auto-remediation settings**
+   ```yaml
+   config:
+     autoRemediation:
+       enabled: true
+       minReliabilityScore: 95.0  # High threshold for production
+       dryRun: true  # Test first, then set to false
+   ```
+
+5. **Implement network policies**
+6. **Enable resource quotas**
+7. **Regular security scanning**
+
+## Monitoring and Observability
+
+The chart is designed to integrate with:
+- **Prometheus**: Metrics collection
+- **Grafana**: Visualization
+- **Jaeger**: Distributed tracing
+- **ELK/Loki**: Log aggregation
+
+Add annotations to enable monitoring:
+
+```yaml
+podAnnotations:
+  prometheus.io/scrape: "true"
+  prometheus.io/port: "8000"
+  prometheus.io/path: "/metrics"
+```
+
+## Troubleshooting
+
+### Pods not starting
+
+```bash
+# Check pod status
+kubectl get pods -l app.kubernetes.io/instance=prod
+
+# Describe pod for events
+kubectl describe pod <pod-name>
+
+# View logs
+kubectl logs <pod-name> -f
+```
+
+### Storage issues
+
+```bash
+# Check PVC status
+kubectl get pvc
+
+# Check storage class
+kubectl get storageclass
+
+# Manually create PVC if needed
+kubectl apply -f - <<EOF
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: mongodb-data
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 10Gi
+EOF
+```
+
+### Ingress not working
+
+```bash
+# Check ingress status
+kubectl get ingress
+kubectl describe ingress prod-datacenter-docs
+
+# Check ingress controller logs
+kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f
+```
+
+## Support
+
+For detailed documentation, see:
+- Chart README: `datacenter-docs/README.md`
+- Main project: `../../README.md`
+- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
+
+## License
+
+See the main repository for license information.