Add Helm chart, Docs, and Config conversion script
Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s
Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s
This commit is contained in:
400
deploy/helm/README.md
Normal file
400
deploy/helm/README.md
Normal file
@@ -0,0 +1,400 @@
|
||||
# Helm Deployment
|
||||
|
||||
This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.
|
||||
|
||||
## Contents
|
||||
|
||||
- `datacenter-docs/` - Main Helm chart for the application
|
||||
- `test-chart.sh` - Automated testing script for chart validation
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Kubernetes cluster (1.19+)
|
||||
- Helm 3.0+
|
||||
- kubectl configured to access your cluster
|
||||
|
||||
### Development/Testing Installation
|
||||
|
||||
```bash
|
||||
# Install with development settings (minimal resources, local testing)
|
||||
helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml
|
||||
|
||||
# Access the application
|
||||
kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
|
||||
kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80
|
||||
|
||||
# View API docs: http://localhost:8000/api/docs
|
||||
# View frontend: http://localhost:8080
|
||||
```
|
||||
|
||||
### Production Installation
|
||||
|
||||
```bash
|
||||
# Copy and customize production values
|
||||
cp datacenter-docs/values-production.yaml my-production-values.yaml
|
||||
|
||||
# Edit my-production-values.yaml:
|
||||
# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
|
||||
# - Update ingress hosts
|
||||
# - Adjust resource limits
|
||||
# - Configure LLM provider
|
||||
# - Review auto-remediation settings
|
||||
|
||||
# Install
|
||||
helm install prod ./datacenter-docs -f my-production-values.yaml
|
||||
|
||||
# Verify deployment
|
||||
helm list
|
||||
kubectl get pods
|
||||
kubectl get ingress
|
||||
```
|
||||
|
||||
## Chart Structure
|
||||
|
||||
```
|
||||
datacenter-docs/
|
||||
├── Chart.yaml # Chart metadata
|
||||
├── values.yaml # Default configuration
|
||||
├── values-development.yaml # Development settings
|
||||
├── values-production.yaml # Production example
|
||||
├── README.md # Detailed chart documentation
|
||||
├── .helmignore # Files to exclude from package
|
||||
└── templates/
|
||||
├── NOTES.txt # Post-install instructions
|
||||
├── _helpers.tpl # Template helpers
|
||||
├── configmap.yaml # Application configuration
|
||||
├── secrets.yaml # Sensitive data
|
||||
├── serviceaccount.yaml # Service account
|
||||
├── mongodb-statefulset.yaml # MongoDB StatefulSet
|
||||
├── mongodb-service.yaml # MongoDB Service
|
||||
├── redis-deployment.yaml # Redis Deployment
|
||||
├── redis-service.yaml # Redis Service
|
||||
├── api-deployment.yaml # API Deployment
|
||||
├── api-service.yaml # API Service
|
||||
├── api-hpa.yaml # API autoscaling
|
||||
├── chat-deployment.yaml # Chat Deployment
|
||||
├── chat-service.yaml # Chat Service
|
||||
├── worker-deployment.yaml # Worker Deployment
|
||||
├── worker-hpa.yaml # Worker autoscaling
|
||||
├── frontend-deployment.yaml # Frontend Deployment
|
||||
├── frontend-service.yaml # Frontend Service
|
||||
└── ingress.yaml # Ingress configuration
|
||||
```
|
||||
|
||||
## Testing the Chart
|
||||
|
||||
Run the automated test script:
|
||||
|
||||
```bash
|
||||
cd deploy/helm
|
||||
./test-chart.sh
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Lint the chart
|
||||
2. Render templates with different value files
|
||||
3. Perform dry-run installation
|
||||
4. Validate Kubernetes manifests
|
||||
5. Package the chart
|
||||
|
||||
## Common Operations
|
||||
|
||||
### Upgrade Release
|
||||
|
||||
```bash
|
||||
# Upgrade with new values
|
||||
helm upgrade prod ./datacenter-docs -f my-production-values.yaml
|
||||
|
||||
# Upgrade with specific parameter changes
|
||||
helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values
|
||||
```
|
||||
|
||||
### Check Status
|
||||
|
||||
```bash
|
||||
# List releases
|
||||
helm list
|
||||
|
||||
# Get release status
|
||||
helm status prod
|
||||
|
||||
# Get current values
|
||||
helm get values prod
|
||||
|
||||
# Get all manifests
|
||||
helm get manifest prod
|
||||
```
|
||||
|
||||
### Rollback
|
||||
|
||||
```bash
|
||||
# View revision history
|
||||
helm history prod
|
||||
|
||||
# Rollback to previous version
|
||||
helm rollback prod
|
||||
|
||||
# Rollback to specific revision
|
||||
helm rollback prod 2
|
||||
```
|
||||
|
||||
### Uninstall
|
||||
|
||||
```bash
|
||||
# Uninstall release
|
||||
helm uninstall prod
|
||||
|
||||
# Also delete PVCs (if using persistent storage)
|
||||
kubectl delete pvc -l app.kubernetes.io/instance=prod
|
||||
```
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### values.yaml
|
||||
Default configuration with reasonable settings for development/testing.
|
||||
|
||||
### values-development.yaml
|
||||
Optimized for local development:
|
||||
- Minimal resource requests/limits
|
||||
- Single replicas
|
||||
- Persistence disabled
|
||||
- Dry-run mode for auto-remediation
|
||||
- Debug logging
|
||||
- Ingress disabled (use port-forward)
|
||||
|
||||
### values-production.yaml
|
||||
Example production configuration:
|
||||
- Higher resource limits
|
||||
- Multiple replicas
|
||||
- Autoscaling enabled
|
||||
- Persistence enabled with larger volumes
|
||||
- TLS/SSL enabled
|
||||
- Production-grade security settings
|
||||
- All components enabled
|
||||
|
||||
**Important**: Copy and customize this file for your environment. Never use default secrets!
|
||||
|
||||
## Available Components
|
||||
|
||||
| Component | Purpose | Default Enabled |
|
||||
|-----------|---------|-----------------|
|
||||
| MongoDB | Document database | Yes |
|
||||
| Redis | Cache & task queue | Yes |
|
||||
| API | REST API service | Yes |
|
||||
| Chat | WebSocket server | No (not implemented) |
|
||||
| Worker | Celery background tasks | No (not implemented) |
|
||||
| Frontend | Web UI | Yes |
|
||||
|
||||
Enable/disable components in your values file:
|
||||
|
||||
```yaml
|
||||
mongodb:
|
||||
enabled: true
|
||||
redis:
|
||||
enabled: true
|
||||
api:
|
||||
enabled: true
|
||||
chat:
|
||||
enabled: false # Set to true when implemented
|
||||
worker:
|
||||
enabled: false # Set to true when implemented
|
||||
frontend:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
The chart deploys a complete microservices architecture:
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ Ingress │
|
||||
└──────┬──────┘
|
||||
│
|
||||
┌─────────────┼─────────────┐
|
||||
│ │ │
|
||||
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
|
||||
│Frontend │ │ API │ │ Chat │
|
||||
└─────────┘ └────┬────┘ └────┬────┘
|
||||
│ │
|
||||
┌─────────────┼────────────┘
|
||||
│ │
|
||||
┌────▼────┐ ┌────▼────┐
|
||||
│ Redis │ │ MongoDB │
|
||||
└─────────┘ └─────────┘
|
||||
▲
|
||||
│
|
||||
┌────┴────┐
|
||||
│ Worker │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
## LLM Provider Configuration
|
||||
|
||||
The chart supports multiple LLM providers. Configure in your values file:
|
||||
|
||||
### OpenAI
|
||||
|
||||
```yaml
|
||||
config:
|
||||
llm:
|
||||
baseUrl: "https://api.openai.com/v1"
|
||||
model: "gpt-4-turbo-preview"
|
||||
secrets:
|
||||
llmApiKey: "sk-your-openai-key"
|
||||
```
|
||||
|
||||
### Anthropic Claude
|
||||
|
||||
```yaml
|
||||
config:
|
||||
llm:
|
||||
baseUrl: "https://api.anthropic.com/v1"
|
||||
model: "claude-3-opus-20240229"
|
||||
secrets:
|
||||
llmApiKey: "sk-ant-your-anthropic-key"
|
||||
```
|
||||
|
||||
### Local (Ollama)
|
||||
|
||||
```yaml
|
||||
config:
|
||||
llm:
|
||||
baseUrl: "http://ollama-service:11434/v1"
|
||||
model: "llama2"
|
||||
secrets:
|
||||
llmApiKey: "not-needed"
|
||||
```
|
||||
|
||||
### Azure OpenAI
|
||||
|
||||
```yaml
|
||||
config:
|
||||
llm:
|
||||
baseUrl: "https://your-resource.openai.azure.com"
|
||||
model: "gpt-4"
|
||||
secrets:
|
||||
llmApiKey: "your-azure-key"
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
For production deployments:
|
||||
|
||||
1. **Change all default secrets**
|
||||
```bash
|
||||
helm install prod ./datacenter-docs \
|
||||
--set secrets.llmApiKey="your-actual-key" \
|
||||
--set secrets.apiSecretKey="$(openssl rand -base64 32)" \
|
||||
--set secrets.mongodbPassword="$(openssl rand -base64 32)"
|
||||
```
|
||||
|
||||
2. **Use external secret management**
|
||||
- HashiCorp Vault
|
||||
- AWS Secrets Manager
|
||||
- Azure Key Vault
|
||||
- Kubernetes External Secrets Operator
|
||||
|
||||
3. **Enable TLS/SSL**
|
||||
```yaml
|
||||
ingress:
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
||||
tls:
|
||||
- secretName: datacenter-docs-tls
|
||||
hosts:
|
||||
- datacenter-docs.yourdomain.com
|
||||
```
|
||||
|
||||
4. **Review auto-remediation settings**
|
||||
```yaml
|
||||
config:
|
||||
autoRemediation:
|
||||
enabled: true
|
||||
minReliabilityScore: 95.0 # High threshold for production
|
||||
dryRun: true # Test first, then set to false
|
||||
```
|
||||
|
||||
5. **Implement network policies**
|
||||
6. **Enable resource quotas**
|
||||
7. **Regular security scanning**
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
The chart is designed to integrate with:
|
||||
- **Prometheus**: Metrics collection
|
||||
- **Grafana**: Visualization
|
||||
- **Jaeger**: Distributed tracing
|
||||
- **ELK/Loki**: Log aggregation
|
||||
|
||||
Add annotations to enable monitoring:
|
||||
|
||||
```yaml
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8000"
|
||||
prometheus.io/path: "/metrics"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Pods not starting
|
||||
|
||||
```bash
|
||||
# Check pod status
|
||||
kubectl get pods -l app.kubernetes.io/instance=prod
|
||||
|
||||
# Describe pod for events
|
||||
kubectl describe pod <pod-name>
|
||||
|
||||
# View logs
|
||||
kubectl logs <pod-name> -f
|
||||
```
|
||||
|
||||
### Storage issues
|
||||
|
||||
```bash
|
||||
# Check PVC status
|
||||
kubectl get pvc
|
||||
|
||||
# Check storage class
|
||||
kubectl get storageclass
|
||||
|
||||
# Manually create PVC if needed
|
||||
kubectl apply -f - <<EOF
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: mongodb-data
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
EOF
|
||||
```
|
||||
|
||||
### Ingress not working
|
||||
|
||||
```bash
|
||||
# Check ingress status
|
||||
kubectl get ingress
|
||||
kubectl describe ingress prod-datacenter-docs
|
||||
|
||||
# Check ingress controller logs
|
||||
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
For detailed documentation, see:
|
||||
- Chart README: `datacenter-docs/README.md`
|
||||
- Main project: `../../README.md`
|
||||
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
|
||||
|
||||
## License
|
||||
|
||||
See the main repository for license information.
|
||||
Reference in New Issue
Block a user