Add Helm chart, Docs, and Config conversion script
Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s

This commit is contained in:
2025-10-22 14:35:21 +02:00
parent ba9900bd57
commit 2719cfff59
31 changed files with 4436 additions and 0 deletions

400
deploy/helm/README.md Normal file
View File

@@ -0,0 +1,400 @@
# Helm Deployment
This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.
## Contents
- `datacenter-docs/` - Main Helm chart for the application
- `test-chart.sh` - Automated testing script for chart validation
## Quick Start
### Prerequisites
- Kubernetes cluster (1.19+)
- Helm 3.0+
- kubectl configured to access your cluster
### Development/Testing Installation
```bash
# Install with development settings (minimal resources, local testing)
helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml
# Access the application
kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80
# View API docs: http://localhost:8000/api/docs
# View frontend: http://localhost:8080
```
### Production Installation
```bash
# Copy and customize production values
cp datacenter-docs/values-production.yaml my-production-values.yaml
# Edit my-production-values.yaml:
# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
# - Update ingress hosts
# - Adjust resource limits
# - Configure LLM provider
# - Review auto-remediation settings
# Install
helm install prod ./datacenter-docs -f my-production-values.yaml
# Verify deployment
helm list
kubectl get pods
kubectl get ingress
```
## Chart Structure
```
datacenter-docs/
├── Chart.yaml # Chart metadata
├── values.yaml # Default configuration
├── values-development.yaml # Development settings
├── values-production.yaml # Production example
├── README.md # Detailed chart documentation
├── .helmignore # Files to exclude from package
└── templates/
├── NOTES.txt # Post-install instructions
├── _helpers.tpl # Template helpers
├── configmap.yaml # Application configuration
├── secrets.yaml # Sensitive data
├── serviceaccount.yaml # Service account
├── mongodb-statefulset.yaml # MongoDB StatefulSet
├── mongodb-service.yaml # MongoDB Service
├── redis-deployment.yaml # Redis Deployment
├── redis-service.yaml # Redis Service
├── api-deployment.yaml # API Deployment
├── api-service.yaml # API Service
├── api-hpa.yaml # API autoscaling
├── chat-deployment.yaml # Chat Deployment
├── chat-service.yaml # Chat Service
├── worker-deployment.yaml # Worker Deployment
├── worker-hpa.yaml # Worker autoscaling
├── frontend-deployment.yaml # Frontend Deployment
├── frontend-service.yaml # Frontend Service
└── ingress.yaml # Ingress configuration
```
## Testing the Chart
Run the automated test script:
```bash
cd deploy/helm
./test-chart.sh
```
This will:
1. Lint the chart
2. Render templates with different value files
3. Perform dry-run installation
4. Validate Kubernetes manifests
5. Package the chart
## Common Operations
### Upgrade Release
```bash
# Upgrade with new values
helm upgrade prod ./datacenter-docs -f my-production-values.yaml
# Upgrade with specific parameter changes
helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values
```
### Check Status
```bash
# List releases
helm list
# Get release status
helm status prod
# Get current values
helm get values prod
# Get all manifests
helm get manifest prod
```
### Rollback
```bash
# View revision history
helm history prod
# Rollback to previous version
helm rollback prod
# Rollback to specific revision
helm rollback prod 2
```
### Uninstall
```bash
# Uninstall release
helm uninstall prod
# Also delete PVCs (if using persistent storage)
kubectl delete pvc -l app.kubernetes.io/instance=prod
```
## Configuration Files
### values.yaml
Default configuration with reasonable settings for development/testing.
### values-development.yaml
Optimized for local development:
- Minimal resource requests/limits
- Single replicas
- Persistence disabled
- Dry-run mode for auto-remediation
- Debug logging
- Ingress disabled (use port-forward)
### values-production.yaml
Example production configuration:
- Higher resource limits
- Multiple replicas
- Autoscaling enabled
- Persistence enabled with larger volumes
- TLS/SSL enabled
- Production-grade security settings
- All components enabled
**Important**: Copy and customize this file for your environment. Never use default secrets!
## Available Components
| Component | Purpose | Default Enabled |
|-----------|---------|-----------------|
| MongoDB | Document database | Yes |
| Redis | Cache & task queue | Yes |
| API | REST API service | Yes |
| Chat | WebSocket server | No (not implemented) |
| Worker | Celery background tasks | No (not implemented) |
| Frontend | Web UI | Yes |
Enable/disable components in your values file:
```yaml
mongodb:
enabled: true
redis:
enabled: true
api:
enabled: true
chat:
enabled: false # Set to true when implemented
worker:
enabled: false # Set to true when implemented
frontend:
enabled: true
```
## Architecture
The chart deploys a complete microservices architecture:
```
┌─────────────┐
│ Ingress │
└──────┬──────┘
┌─────────────┼─────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│Frontend │ │ API │ │ Chat │
└─────────┘ └────┬────┘ └────┬────┘
│ │
┌─────────────┼────────────┘
│ │
┌────▼────┐ ┌────▼────┐
│ Redis │ │ MongoDB │
└─────────┘ └─────────┘
┌────┴────┐
│ Worker │
└─────────┘
```
## LLM Provider Configuration
The chart supports multiple LLM providers. Configure in your values file:
### OpenAI
```yaml
config:
llm:
baseUrl: "https://api.openai.com/v1"
model: "gpt-4-turbo-preview"
secrets:
llmApiKey: "sk-your-openai-key"
```
### Anthropic Claude
```yaml
config:
llm:
baseUrl: "https://api.anthropic.com/v1"
model: "claude-3-opus-20240229"
secrets:
llmApiKey: "sk-ant-your-anthropic-key"
```
### Local (Ollama)
```yaml
config:
llm:
baseUrl: "http://ollama-service:11434/v1"
model: "llama2"
secrets:
llmApiKey: "not-needed"
```
### Azure OpenAI
```yaml
config:
llm:
baseUrl: "https://your-resource.openai.azure.com"
model: "gpt-4"
secrets:
llmApiKey: "your-azure-key"
```
## Security Best Practices
For production deployments:
1. **Change all default secrets**
```bash
helm install prod ./datacenter-docs \
--set secrets.llmApiKey="your-actual-key" \
--set secrets.apiSecretKey="$(openssl rand -base64 32)" \
--set secrets.mongodbPassword="$(openssl rand -base64 32)"
```
2. **Use external secret management**
- HashiCorp Vault
- AWS Secrets Manager
- Azure Key Vault
- Kubernetes External Secrets Operator
3. **Enable TLS/SSL**
```yaml
ingress:
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
tls:
- secretName: datacenter-docs-tls
hosts:
- datacenter-docs.yourdomain.com
```
4. **Review auto-remediation settings**
```yaml
config:
autoRemediation:
enabled: true
minReliabilityScore: 95.0 # High threshold for production
dryRun: true # Test first, then set to false
```
5. **Implement network policies**
6. **Enable resource quotas**
7. **Regular security scanning**
## Monitoring and Observability
The chart is designed to integrate with:
- **Prometheus**: Metrics collection
- **Grafana**: Visualization
- **Jaeger**: Distributed tracing
- **ELK/Loki**: Log aggregation
Add annotations to enable monitoring:
```yaml
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
```
## Troubleshooting
### Pods not starting
```bash
# Check pod status
kubectl get pods -l app.kubernetes.io/instance=prod
# Describe pod for events
kubectl describe pod <pod-name>
# View logs
kubectl logs <pod-name> -f
```
### Storage issues
```bash
# Check PVC status
kubectl get pvc
# Check storage class
kubectl get storageclass
# Manually create PVC if needed
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
EOF
```
### Ingress not working
```bash
# Check ingress status
kubectl get ingress
kubectl describe ingress prod-datacenter-docs
# Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f
```
## Support
For detailed documentation, see:
- Chart README: `datacenter-docs/README.md`
- Main project: `../../README.md`
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
## License
See the main repository for license information.