Files
llm-automation-docs-and-rem…/deploy/helm/datacenter-docs/README.md
dnviti 2719cfff59
Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s
Add Helm chart, Docs, and Config conversion script
2025-10-22 14:35:21 +02:00

424 lines
11 KiB
Markdown

# Datacenter Docs & Remediation Engine - Helm Chart
Helm chart for deploying the LLM Automation - Docs & Remediation Engine on Kubernetes.
## Overview
This chart deploys a complete stack including:
- **MongoDB**: Document database for storing tickets, documentation, and metadata
- **Redis**: Cache and task queue backend
- **API Service**: FastAPI REST API with auto-remediation capabilities
- **Chat Service**: WebSocket server for real-time documentation queries (optional, not yet implemented)
- **Worker Service**: Celery workers for background tasks (optional, not yet implemented)
- **Frontend**: React-based web interface
## Prerequisites
- Kubernetes 1.19+
- Helm 3.0+
- PersistentVolume provisioner support in the underlying infrastructure (for MongoDB persistence)
- Ingress controller (optional, for external access)
## Installation
### Quick Start
```bash
# Add the chart repository (if published)
helm repo add datacenter-docs https://your-repo-url
helm repo update
# Install with default values
helm install my-datacenter-docs datacenter-docs/datacenter-docs
# Or install from local directory
helm install my-datacenter-docs ./datacenter-docs
```
### Production Installation
For production, create a custom `values.yaml`:
```bash
# Copy and edit the values file
cp values.yaml my-values.yaml
# Edit my-values.yaml with your configuration
# At minimum, change:
# - secrets.llmApiKey
# - secrets.apiSecretKey
# - ingress.hosts
# Install with custom values
helm install my-datacenter-docs ./datacenter-docs -f my-values.yaml
```
### Install with Specific Configuration
```bash
helm install my-datacenter-docs ./datacenter-docs \
--set secrets.llmApiKey="sk-your-openai-api-key" \
--set secrets.apiSecretKey="your-strong-secret-key" \
--set ingress.hosts[0].host="datacenter-docs.yourdomain.com" \
--set mongodb.persistence.size="50Gi"
```
## Configuration
### Key Configuration Parameters
#### Global Settings
| Parameter | Description | Default |
|-----------|-------------|---------|
| `global.imagePullPolicy` | Image pull policy | `IfNotPresent` |
| `global.storageClass` | Storage class for PVCs | `""` |
#### MongoDB
| Parameter | Description | Default |
|-----------|-------------|---------|
| `mongodb.enabled` | Enable MongoDB | `true` |
| `mongodb.image.repository` | MongoDB image | `mongo` |
| `mongodb.image.tag` | MongoDB version | `7` |
| `mongodb.auth.rootUsername` | Root username | `admin` |
| `mongodb.auth.rootPassword` | Root password | `admin123` |
| `mongodb.persistence.enabled` | Enable persistence | `true` |
| `mongodb.persistence.size` | Volume size | `10Gi` |
| `mongodb.resources.requests.memory` | Memory request | `512Mi` |
| `mongodb.resources.limits.memory` | Memory limit | `2Gi` |
#### Redis
| Parameter | Description | Default |
|-----------|-------------|---------|
| `redis.enabled` | Enable Redis | `true` |
| `redis.image.repository` | Redis image | `redis` |
| `redis.image.tag` | Redis version | `7-alpine` |
| `redis.resources.requests.memory` | Memory request | `128Mi` |
| `redis.resources.limits.memory` | Memory limit | `512Mi` |
#### API Service
| Parameter | Description | Default |
|-----------|-------------|---------|
| `api.enabled` | Enable API service | `true` |
| `api.replicaCount` | Number of replicas | `2` |
| `api.image.repository` | API image repository | `datacenter-docs-api` |
| `api.image.tag` | API image tag | `latest` |
| `api.service.port` | Service port | `8000` |
| `api.autoscaling.enabled` | Enable HPA | `true` |
| `api.autoscaling.minReplicas` | Min replicas | `2` |
| `api.autoscaling.maxReplicas` | Max replicas | `10` |
| `api.resources.requests.memory` | Memory request | `512Mi` |
| `api.resources.limits.memory` | Memory limit | `2Gi` |
#### Worker Service
| Parameter | Description | Default |
|-----------|-------------|---------|
| `worker.enabled` | Enable worker service | `false` |
| `worker.replicaCount` | Number of replicas | `3` |
| `worker.autoscaling.enabled` | Enable HPA | `true` |
| `worker.autoscaling.minReplicas` | Min replicas | `1` |
| `worker.autoscaling.maxReplicas` | Max replicas | `10` |
#### Chat Service
| Parameter | Description | Default |
|-----------|-------------|---------|
| `chat.enabled` | Enable chat service | `false` |
| `chat.replicaCount` | Number of replicas | `1` |
| `chat.service.port` | Service port | `8001` |
#### Frontend
| Parameter | Description | Default |
|-----------|-------------|---------|
| `frontend.enabled` | Enable frontend | `true` |
| `frontend.replicaCount` | Number of replicas | `2` |
| `frontend.service.port` | Service port | `80` |
#### Ingress
| Parameter | Description | Default |
|-----------|-------------|---------|
| `ingress.enabled` | Enable ingress | `true` |
| `ingress.className` | Ingress class | `nginx` |
| `ingress.hosts[0].host` | Hostname | `datacenter-docs.example.com` |
| `ingress.tls[0].secretName` | TLS secret name | `datacenter-docs-tls` |
#### Application Configuration
| Parameter | Description | Default |
|-----------|-------------|---------|
| `config.llm.baseUrl` | LLM provider URL | `https://api.openai.com/v1` |
| `config.llm.model` | LLM model | `gpt-4-turbo-preview` |
| `config.autoRemediation.enabled` | Enable auto-remediation | `true` |
| `config.autoRemediation.minReliabilityScore` | Min reliability score | `85.0` |
| `config.autoRemediation.dryRun` | Dry run mode | `false` |
| `config.logLevel` | Log level | `INFO` |
#### Secrets
| Parameter | Description | Default |
|-----------|-------------|---------|
| `secrets.llmApiKey` | LLM API key | `sk-your-openai-api-key-here` |
| `secrets.apiSecretKey` | API secret key | `your-secret-key-here-change-in-production` |
**IMPORTANT**: Change these secrets in production!
## Usage Examples
### Enable All Services (including chat and worker)
```bash
helm install my-datacenter-docs ./datacenter-docs \
--set chat.enabled=true \
--set worker.enabled=true
```
### Disable Auto-Remediation
```bash
helm install my-datacenter-docs ./datacenter-docs \
--set config.autoRemediation.enabled=false
```
### Use Different LLM Provider (e.g., Anthropic Claude)
```bash
helm install my-datacenter-docs ./datacenter-docs \
--set config.llm.baseUrl="https://api.anthropic.com/v1" \
--set config.llm.model="claude-3-opus-20240229" \
--set secrets.llmApiKey="sk-ant-your-anthropic-key"
```
### Use Local LLM (e.g., Ollama)
```bash
helm install my-datacenter-docs ./datacenter-docs \
--set config.llm.baseUrl="http://ollama-service:11434/v1" \
--set config.llm.model="llama2" \
--set secrets.llmApiKey="not-needed"
```
### Scale MongoDB Storage
```bash
helm install my-datacenter-docs ./datacenter-docs \
--set mongodb.persistence.size="100Gi"
```
### Disable Ingress (use port-forward instead)
```bash
helm install my-datacenter-docs ./datacenter-docs \
--set ingress.enabled=false
```
### Production Configuration with External MongoDB
```yaml
# production-values.yaml
mongodb:
enabled: false
config:
mongodbUrl: "mongodb://user:pass@external-mongodb:27017/datacenter_docs?authSource=admin"
api:
replicaCount: 5
autoscaling:
maxReplicas: 20
secrets:
llmApiKey: "sk-your-production-api-key"
apiSecretKey: "your-production-secret-key"
ingress:
hosts:
- host: "datacenter-docs.prod.yourdomain.com"
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: api
```
```bash
helm install prod-datacenter-docs ./datacenter-docs -f production-values.yaml
```
## Upgrading
```bash
# Upgrade with new values
helm upgrade my-datacenter-docs ./datacenter-docs -f my-values.yaml
# Upgrade specific parameters
helm upgrade my-datacenter-docs ./datacenter-docs \
--set api.image.tag="v1.2.0" \
--reuse-values
```
## Uninstallation
```bash
helm uninstall my-datacenter-docs
```
**Note**: This will delete all resources except PersistentVolumeClaims (PVCs) for MongoDB. To also delete PVCs:
```bash
kubectl delete pvc -l app.kubernetes.io/instance=my-datacenter-docs
```
## Monitoring and Troubleshooting
### Check Pod Status
```bash
kubectl get pods -l app.kubernetes.io/instance=my-datacenter-docs
```
### View Logs
```bash
# API logs
kubectl logs -l app.kubernetes.io/component=api -f
# Worker logs
kubectl logs -l app.kubernetes.io/component=worker -f
# MongoDB logs
kubectl logs -l app.kubernetes.io/component=database -f
```
### Access Services Locally
```bash
# API
kubectl port-forward svc/my-datacenter-docs-api 8000:8000
# Frontend
kubectl port-forward svc/my-datacenter-docs-frontend 8080:80
# MongoDB (for debugging)
kubectl port-forward svc/my-datacenter-docs-mongodb 27017:27017
```
### Common Issues
#### Pods Stuck in Pending
Check if PVCs are bound:
```bash
kubectl get pvc
```
If storage class is missing, set it:
```bash
helm upgrade my-datacenter-docs ./datacenter-docs \
--set mongodb.persistence.storageClass="standard" \
--reuse-values
```
#### API Pods Crash Loop
Check logs:
```bash
kubectl logs -l app.kubernetes.io/component=api --tail=100
```
Common causes:
- MongoDB not ready (wait for init containers)
- Invalid LLM API key
- Missing environment variables
#### Cannot Access via Ingress
Check ingress status:
```bash
kubectl get ingress
kubectl describe ingress my-datacenter-docs
```
Ensure:
- Ingress controller is installed
- DNS points to ingress IP
- TLS certificate is valid (if using HTTPS)
## Security Considerations
### Production Checklist
- [ ] Change `secrets.llmApiKey` to a valid API key
- [ ] Change `secrets.apiSecretKey` to a strong random key
- [ ] Change MongoDB credentials (`mongodb.auth.rootPassword`)
- [ ] Enable TLS/SSL on ingress
- [ ] Review RBAC policies
- [ ] Use external secret management (e.g., HashiCorp Vault, AWS Secrets Manager)
- [ ] Enable network policies
- [ ] Set resource limits on all pods
- [ ] Enable pod security policies
- [ ] Review auto-remediation settings
### Using External Secrets
Instead of storing secrets in values.yaml, use Kubernetes secrets:
```bash
# Create secret
kubectl create secret generic datacenter-docs-secrets \
--from-literal=llm-api-key="sk-your-key" \
--from-literal=api-secret-key="your-secret"
# Modify templates to use existing secret
# (requires chart customization)
```
## Development
### Validating the Chart
```bash
# Lint the chart
helm lint ./datacenter-docs
# Dry run
helm install my-test ./datacenter-docs --dry-run --debug
# Template rendering
helm template my-test ./datacenter-docs > rendered.yaml
```
### Testing Locally
```bash
# Create kind cluster
kind create cluster
# Install chart
helm install test ./datacenter-docs \
--set ingress.enabled=false \
--set api.autoscaling.enabled=false \
--set mongodb.persistence.enabled=false
# Test
kubectl port-forward svc/test-datacenter-docs-api 8000:8000
curl http://localhost:8000/health
```
## Support
For issues and questions:
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
- Documentation: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine
## License
See the main repository for license information.