Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s
401 lines
9.6 KiB
Markdown
401 lines
9.6 KiB
Markdown
# Helm Deployment
|
|
|
|
This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.
|
|
|
|
## Contents
|
|
|
|
- `datacenter-docs/` - Main Helm chart for the application
|
|
- `test-chart.sh` - Automated testing script for chart validation
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Kubernetes cluster (1.19+)
|
|
- Helm 3.0+
|
|
- kubectl configured to access your cluster
|
|
|
|
### Development/Testing Installation
|
|
|
|
```bash
|
|
# Install with development settings (minimal resources, local testing)
|
|
helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml
|
|
|
|
# Access the application
|
|
kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
|
|
kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80
|
|
|
|
# View API docs: http://localhost:8000/api/docs
|
|
# View frontend: http://localhost:8080
|
|
```
|
|
|
|
### Production Installation
|
|
|
|
```bash
|
|
# Copy and customize production values
|
|
cp datacenter-docs/values-production.yaml my-production-values.yaml
|
|
|
|
# Edit my-production-values.yaml:
|
|
# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
|
|
# - Update ingress hosts
|
|
# - Adjust resource limits
|
|
# - Configure LLM provider
|
|
# - Review auto-remediation settings
|
|
|
|
# Install
|
|
helm install prod ./datacenter-docs -f my-production-values.yaml
|
|
|
|
# Verify deployment
|
|
helm list
|
|
kubectl get pods
|
|
kubectl get ingress
|
|
```
|
|
|
|
## Chart Structure
|
|
|
|
```
|
|
datacenter-docs/
|
|
├── Chart.yaml # Chart metadata
|
|
├── values.yaml # Default configuration
|
|
├── values-development.yaml # Development settings
|
|
├── values-production.yaml # Production example
|
|
├── README.md # Detailed chart documentation
|
|
├── .helmignore # Files to exclude from package
|
|
└── templates/
|
|
├── NOTES.txt # Post-install instructions
|
|
├── _helpers.tpl # Template helpers
|
|
├── configmap.yaml # Application configuration
|
|
├── secrets.yaml # Sensitive data
|
|
├── serviceaccount.yaml # Service account
|
|
├── mongodb-statefulset.yaml # MongoDB StatefulSet
|
|
├── mongodb-service.yaml # MongoDB Service
|
|
├── redis-deployment.yaml # Redis Deployment
|
|
├── redis-service.yaml # Redis Service
|
|
├── api-deployment.yaml # API Deployment
|
|
├── api-service.yaml # API Service
|
|
├── api-hpa.yaml # API autoscaling
|
|
├── chat-deployment.yaml # Chat Deployment
|
|
├── chat-service.yaml # Chat Service
|
|
├── worker-deployment.yaml # Worker Deployment
|
|
├── worker-hpa.yaml # Worker autoscaling
|
|
├── frontend-deployment.yaml # Frontend Deployment
|
|
├── frontend-service.yaml # Frontend Service
|
|
└── ingress.yaml # Ingress configuration
|
|
```
|
|
|
|
## Testing the Chart
|
|
|
|
Run the automated test script:
|
|
|
|
```bash
|
|
cd deploy/helm
|
|
./test-chart.sh
|
|
```
|
|
|
|
This will:
|
|
1. Lint the chart
|
|
2. Render templates with different value files
|
|
3. Perform dry-run installation
|
|
4. Validate Kubernetes manifests
|
|
5. Package the chart
|
|
|
|
## Common Operations
|
|
|
|
### Upgrade Release
|
|
|
|
```bash
|
|
# Upgrade with new values
|
|
helm upgrade prod ./datacenter-docs -f my-production-values.yaml
|
|
|
|
# Upgrade with specific parameter changes
|
|
helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values
|
|
```
|
|
|
|
### Check Status
|
|
|
|
```bash
|
|
# List releases
|
|
helm list
|
|
|
|
# Get release status
|
|
helm status prod
|
|
|
|
# Get current values
|
|
helm get values prod
|
|
|
|
# Get all manifests
|
|
helm get manifest prod
|
|
```
|
|
|
|
### Rollback
|
|
|
|
```bash
|
|
# View revision history
|
|
helm history prod
|
|
|
|
# Rollback to previous version
|
|
helm rollback prod
|
|
|
|
# Rollback to specific revision
|
|
helm rollback prod 2
|
|
```
|
|
|
|
### Uninstall
|
|
|
|
```bash
|
|
# Uninstall release
|
|
helm uninstall prod
|
|
|
|
# Also delete PVCs (if using persistent storage)
|
|
kubectl delete pvc -l app.kubernetes.io/instance=prod
|
|
```
|
|
|
|
## Configuration Files
|
|
|
|
### values.yaml
|
|
Default configuration with reasonable settings for development/testing.
|
|
|
|
### values-development.yaml
|
|
Optimized for local development:
|
|
- Minimal resource requests/limits
|
|
- Single replicas
|
|
- Persistence disabled
|
|
- Dry-run mode for auto-remediation
|
|
- Debug logging
|
|
- Ingress disabled (use port-forward)
|
|
|
|
### values-production.yaml
|
|
Example production configuration:
|
|
- Higher resource limits
|
|
- Multiple replicas
|
|
- Autoscaling enabled
|
|
- Persistence enabled with larger volumes
|
|
- TLS/SSL enabled
|
|
- Production-grade security settings
|
|
- All components enabled
|
|
|
|
**Important**: Copy and customize this file for your environment. Never use default secrets!
|
|
|
|
## Available Components
|
|
|
|
| Component | Purpose | Default Enabled |
|
|
|-----------|---------|-----------------|
|
|
| MongoDB | Document database | Yes |
|
|
| Redis | Cache & task queue | Yes |
|
|
| API | REST API service | Yes |
|
|
| Chat | WebSocket server | No (not implemented) |
|
|
| Worker | Celery background tasks | No (not implemented) |
|
|
| Frontend | Web UI | Yes |
|
|
|
|
Enable/disable components in your values file:
|
|
|
|
```yaml
|
|
mongodb:
|
|
enabled: true
|
|
redis:
|
|
enabled: true
|
|
api:
|
|
enabled: true
|
|
chat:
|
|
enabled: false # Set to true when implemented
|
|
worker:
|
|
enabled: false # Set to true when implemented
|
|
frontend:
|
|
enabled: true
|
|
```
|
|
|
|
## Architecture
|
|
|
|
The chart deploys a complete microservices architecture:
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ Ingress │
|
|
└──────┬──────┘
|
|
│
|
|
┌─────────────┼─────────────┐
|
|
│ │ │
|
|
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
|
|
│Frontend │ │ API │ │ Chat │
|
|
└─────────┘ └────┬────┘ └────┬────┘
|
|
│ │
|
|
┌─────────────┼────────────┘
|
|
│ │
|
|
┌────▼────┐ ┌────▼────┐
|
|
│ Redis │ │ MongoDB │
|
|
└─────────┘ └─────────┘
|
|
▲
|
|
│
|
|
┌────┴────┐
|
|
│ Worker │
|
|
└─────────┘
|
|
```
|
|
|
|
## LLM Provider Configuration
|
|
|
|
The chart supports multiple LLM providers. Configure in your values file:
|
|
|
|
### OpenAI
|
|
|
|
```yaml
|
|
config:
|
|
llm:
|
|
baseUrl: "https://api.openai.com/v1"
|
|
model: "gpt-4-turbo-preview"
|
|
secrets:
|
|
llmApiKey: "sk-your-openai-key"
|
|
```
|
|
|
|
### Anthropic Claude
|
|
|
|
```yaml
|
|
config:
|
|
llm:
|
|
baseUrl: "https://api.anthropic.com/v1"
|
|
model: "claude-3-opus-20240229"
|
|
secrets:
|
|
llmApiKey: "sk-ant-your-anthropic-key"
|
|
```
|
|
|
|
### Local (Ollama)
|
|
|
|
```yaml
|
|
config:
|
|
llm:
|
|
baseUrl: "http://ollama-service:11434/v1"
|
|
model: "llama2"
|
|
secrets:
|
|
llmApiKey: "not-needed"
|
|
```
|
|
|
|
### Azure OpenAI
|
|
|
|
```yaml
|
|
config:
|
|
llm:
|
|
baseUrl: "https://your-resource.openai.azure.com"
|
|
model: "gpt-4"
|
|
secrets:
|
|
llmApiKey: "your-azure-key"
|
|
```
|
|
|
|
## Security Best Practices
|
|
|
|
For production deployments:
|
|
|
|
1. **Change all default secrets**
|
|
```bash
|
|
helm install prod ./datacenter-docs \
|
|
--set secrets.llmApiKey="your-actual-key" \
|
|
--set secrets.apiSecretKey="$(openssl rand -base64 32)" \
|
|
--set secrets.mongodbPassword="$(openssl rand -base64 32)"
|
|
```
|
|
|
|
2. **Use external secret management**
|
|
- HashiCorp Vault
|
|
- AWS Secrets Manager
|
|
- Azure Key Vault
|
|
- Kubernetes External Secrets Operator
|
|
|
|
3. **Enable TLS/SSL**
|
|
```yaml
|
|
ingress:
|
|
annotations:
|
|
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
|
tls:
|
|
- secretName: datacenter-docs-tls
|
|
hosts:
|
|
- datacenter-docs.yourdomain.com
|
|
```
|
|
|
|
4. **Review auto-remediation settings**
|
|
```yaml
|
|
config:
|
|
autoRemediation:
|
|
enabled: true
|
|
minReliabilityScore: 95.0 # High threshold for production
|
|
dryRun: true # Test first, then set to false
|
|
```
|
|
|
|
5. **Implement network policies**
|
|
6. **Enable resource quotas**
|
|
7. **Regular security scanning**
|
|
|
|
## Monitoring and Observability
|
|
|
|
The chart is designed to integrate with:
|
|
- **Prometheus**: Metrics collection
|
|
- **Grafana**: Visualization
|
|
- **Jaeger**: Distributed tracing
|
|
- **ELK/Loki**: Log aggregation
|
|
|
|
Add annotations to enable monitoring:
|
|
|
|
```yaml
|
|
podAnnotations:
|
|
prometheus.io/scrape: "true"
|
|
prometheus.io/port: "8000"
|
|
prometheus.io/path: "/metrics"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Pods not starting
|
|
|
|
```bash
|
|
# Check pod status
|
|
kubectl get pods -l app.kubernetes.io/instance=prod
|
|
|
|
# Describe pod for events
|
|
kubectl describe pod <pod-name>
|
|
|
|
# View logs
|
|
kubectl logs <pod-name> -f
|
|
```
|
|
|
|
### Storage issues
|
|
|
|
```bash
|
|
# Check PVC status
|
|
kubectl get pvc
|
|
|
|
# Check storage class
|
|
kubectl get storageclass
|
|
|
|
# Manually create PVC if needed
|
|
kubectl apply -f - <<EOF
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: mongodb-data
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 10Gi
|
|
EOF
|
|
```
|
|
|
|
### Ingress not working
|
|
|
|
```bash
|
|
# Check ingress status
|
|
kubectl get ingress
|
|
kubectl describe ingress prod-datacenter-docs
|
|
|
|
# Check ingress controller logs
|
|
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f
|
|
```
|
|
|
|
## Support
|
|
|
|
For detailed documentation, see:
|
|
- Chart README: `datacenter-docs/README.md`
|
|
- Main project: `../../README.md`
|
|
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
|
|
|
|
## License
|
|
|
|
See the main repository for license information.
|