Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s
424 lines
11 KiB
Markdown
424 lines
11 KiB
Markdown
# Datacenter Docs & Remediation Engine - Helm Chart
|
|
|
|
Helm chart for deploying the LLM Automation - Docs & Remediation Engine on Kubernetes.
|
|
|
|
## Overview
|
|
|
|
This chart deploys a complete stack including:
|
|
- **MongoDB**: Document database for storing tickets, documentation, and metadata
|
|
- **Redis**: Cache and task queue backend
|
|
- **API Service**: FastAPI REST API with auto-remediation capabilities
|
|
- **Chat Service**: WebSocket server for real-time documentation queries (optional, not yet implemented)
|
|
- **Worker Service**: Celery workers for background tasks (optional, not yet implemented)
|
|
- **Frontend**: React-based web interface
|
|
|
|
## Prerequisites
|
|
|
|
- Kubernetes 1.19+
|
|
- Helm 3.0+
|
|
- PersistentVolume provisioner support in the underlying infrastructure (for MongoDB persistence)
|
|
- Ingress controller (optional, for external access)
|
|
|
|
## Installation
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Add the chart repository (if published)
|
|
helm repo add datacenter-docs https://your-repo-url
|
|
helm repo update
|
|
|
|
# Install with default values
|
|
helm install my-datacenter-docs datacenter-docs/datacenter-docs
|
|
|
|
# Or install from local directory
|
|
helm install my-datacenter-docs ./datacenter-docs
|
|
```
|
|
|
|
### Production Installation
|
|
|
|
For production, create a custom `values.yaml`:
|
|
|
|
```bash
|
|
# Copy and edit the values file
|
|
cp values.yaml my-values.yaml
|
|
|
|
# Edit my-values.yaml with your configuration
|
|
# At minimum, change:
|
|
# - secrets.llmApiKey
|
|
# - secrets.apiSecretKey
|
|
# - ingress.hosts
|
|
|
|
# Install with custom values
|
|
helm install my-datacenter-docs ./datacenter-docs -f my-values.yaml
|
|
```
|
|
|
|
### Install with Specific Configuration
|
|
|
|
```bash
|
|
helm install my-datacenter-docs ./datacenter-docs \
|
|
--set secrets.llmApiKey="sk-your-openai-api-key" \
|
|
--set secrets.apiSecretKey="your-strong-secret-key" \
|
|
--set ingress.hosts[0].host="datacenter-docs.yourdomain.com" \
|
|
--set mongodb.persistence.size="50Gi"
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Key Configuration Parameters
|
|
|
|
#### Global Settings
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `global.imagePullPolicy` | Image pull policy | `IfNotPresent` |
|
|
| `global.storageClass` | Storage class for PVCs | `""` |
|
|
|
|
#### MongoDB
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `mongodb.enabled` | Enable MongoDB | `true` |
|
|
| `mongodb.image.repository` | MongoDB image | `mongo` |
|
|
| `mongodb.image.tag` | MongoDB version | `7` |
|
|
| `mongodb.auth.rootUsername` | Root username | `admin` |
|
|
| `mongodb.auth.rootPassword` | Root password | `admin123` |
|
|
| `mongodb.persistence.enabled` | Enable persistence | `true` |
|
|
| `mongodb.persistence.size` | Volume size | `10Gi` |
|
|
| `mongodb.resources.requests.memory` | Memory request | `512Mi` |
|
|
| `mongodb.resources.limits.memory` | Memory limit | `2Gi` |
|
|
|
|
#### Redis
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `redis.enabled` | Enable Redis | `true` |
|
|
| `redis.image.repository` | Redis image | `redis` |
|
|
| `redis.image.tag` | Redis version | `7-alpine` |
|
|
| `redis.resources.requests.memory` | Memory request | `128Mi` |
|
|
| `redis.resources.limits.memory` | Memory limit | `512Mi` |
|
|
|
|
#### API Service
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `api.enabled` | Enable API service | `true` |
|
|
| `api.replicaCount` | Number of replicas | `2` |
|
|
| `api.image.repository` | API image repository | `datacenter-docs-api` |
|
|
| `api.image.tag` | API image tag | `latest` |
|
|
| `api.service.port` | Service port | `8000` |
|
|
| `api.autoscaling.enabled` | Enable HPA | `true` |
|
|
| `api.autoscaling.minReplicas` | Min replicas | `2` |
|
|
| `api.autoscaling.maxReplicas` | Max replicas | `10` |
|
|
| `api.resources.requests.memory` | Memory request | `512Mi` |
|
|
| `api.resources.limits.memory` | Memory limit | `2Gi` |
|
|
|
|
#### Worker Service
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `worker.enabled` | Enable worker service | `false` |
|
|
| `worker.replicaCount` | Number of replicas | `3` |
|
|
| `worker.autoscaling.enabled` | Enable HPA | `true` |
|
|
| `worker.autoscaling.minReplicas` | Min replicas | `1` |
|
|
| `worker.autoscaling.maxReplicas` | Max replicas | `10` |
|
|
|
|
#### Chat Service
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `chat.enabled` | Enable chat service | `false` |
|
|
| `chat.replicaCount` | Number of replicas | `1` |
|
|
| `chat.service.port` | Service port | `8001` |
|
|
|
|
#### Frontend
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `frontend.enabled` | Enable frontend | `true` |
|
|
| `frontend.replicaCount` | Number of replicas | `2` |
|
|
| `frontend.service.port` | Service port | `80` |
|
|
|
|
#### Ingress
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `ingress.enabled` | Enable ingress | `true` |
|
|
| `ingress.className` | Ingress class | `nginx` |
|
|
| `ingress.hosts[0].host` | Hostname | `datacenter-docs.example.com` |
|
|
| `ingress.tls[0].secretName` | TLS secret name | `datacenter-docs-tls` |
|
|
|
|
#### Application Configuration
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `config.llm.baseUrl` | LLM provider URL | `https://api.openai.com/v1` |
|
|
| `config.llm.model` | LLM model | `gpt-4-turbo-preview` |
|
|
| `config.autoRemediation.enabled` | Enable auto-remediation | `true` |
|
|
| `config.autoRemediation.minReliabilityScore` | Min reliability score | `85.0` |
|
|
| `config.autoRemediation.dryRun` | Dry run mode | `false` |
|
|
| `config.logLevel` | Log level | `INFO` |
|
|
|
|
#### Secrets
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `secrets.llmApiKey` | LLM API key | `sk-your-openai-api-key-here` |
|
|
| `secrets.apiSecretKey` | API secret key | `your-secret-key-here-change-in-production` |
|
|
|
|
**IMPORTANT**: Change these secrets in production!
|
|
|
|
## Usage Examples
|
|
|
|
### Enable All Services (including chat and worker)
|
|
|
|
```bash
|
|
helm install my-datacenter-docs ./datacenter-docs \
|
|
--set chat.enabled=true \
|
|
--set worker.enabled=true
|
|
```
|
|
|
|
### Disable Auto-Remediation
|
|
|
|
```bash
|
|
helm install my-datacenter-docs ./datacenter-docs \
|
|
--set config.autoRemediation.enabled=false
|
|
```
|
|
|
|
### Use Different LLM Provider (e.g., Anthropic Claude)
|
|
|
|
```bash
|
|
helm install my-datacenter-docs ./datacenter-docs \
|
|
--set config.llm.baseUrl="https://api.anthropic.com/v1" \
|
|
--set config.llm.model="claude-3-opus-20240229" \
|
|
--set secrets.llmApiKey="sk-ant-your-anthropic-key"
|
|
```
|
|
|
|
### Use Local LLM (e.g., Ollama)
|
|
|
|
```bash
|
|
helm install my-datacenter-docs ./datacenter-docs \
|
|
--set config.llm.baseUrl="http://ollama-service:11434/v1" \
|
|
--set config.llm.model="llama2" \
|
|
--set secrets.llmApiKey="not-needed"
|
|
```
|
|
|
|
### Scale MongoDB Storage
|
|
|
|
```bash
|
|
helm install my-datacenter-docs ./datacenter-docs \
|
|
--set mongodb.persistence.size="100Gi"
|
|
```
|
|
|
|
### Disable Ingress (use port-forward instead)
|
|
|
|
```bash
|
|
helm install my-datacenter-docs ./datacenter-docs \
|
|
--set ingress.enabled=false
|
|
```
|
|
|
|
### Production Configuration with External MongoDB
|
|
|
|
```yaml
|
|
# production-values.yaml
|
|
mongodb:
|
|
enabled: false
|
|
|
|
config:
|
|
mongodbUrl: "mongodb://user:pass@external-mongodb:27017/datacenter_docs?authSource=admin"
|
|
|
|
api:
|
|
replicaCount: 5
|
|
autoscaling:
|
|
maxReplicas: 20
|
|
|
|
secrets:
|
|
llmApiKey: "sk-your-production-api-key"
|
|
apiSecretKey: "your-production-secret-key"
|
|
|
|
ingress:
|
|
hosts:
|
|
- host: "datacenter-docs.prod.yourdomain.com"
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
service: frontend
|
|
- path: /api
|
|
pathType: Prefix
|
|
service: api
|
|
```
|
|
|
|
```bash
|
|
helm install prod-datacenter-docs ./datacenter-docs -f production-values.yaml
|
|
```
|
|
|
|
## Upgrading
|
|
|
|
```bash
|
|
# Upgrade with new values
|
|
helm upgrade my-datacenter-docs ./datacenter-docs -f my-values.yaml
|
|
|
|
# Upgrade specific parameters
|
|
helm upgrade my-datacenter-docs ./datacenter-docs \
|
|
--set api.image.tag="v1.2.0" \
|
|
--reuse-values
|
|
```
|
|
|
|
## Uninstallation
|
|
|
|
```bash
|
|
helm uninstall my-datacenter-docs
|
|
```
|
|
|
|
**Note**: This will delete all resources except PersistentVolumeClaims (PVCs) for MongoDB. To also delete PVCs:
|
|
|
|
```bash
|
|
kubectl delete pvc -l app.kubernetes.io/instance=my-datacenter-docs
|
|
```
|
|
|
|
## Monitoring and Troubleshooting
|
|
|
|
### Check Pod Status
|
|
|
|
```bash
|
|
kubectl get pods -l app.kubernetes.io/instance=my-datacenter-docs
|
|
```
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# API logs
|
|
kubectl logs -l app.kubernetes.io/component=api -f
|
|
|
|
# Worker logs
|
|
kubectl logs -l app.kubernetes.io/component=worker -f
|
|
|
|
# MongoDB logs
|
|
kubectl logs -l app.kubernetes.io/component=database -f
|
|
```
|
|
|
|
### Access Services Locally
|
|
|
|
```bash
|
|
# API
|
|
kubectl port-forward svc/my-datacenter-docs-api 8000:8000
|
|
|
|
# Frontend
|
|
kubectl port-forward svc/my-datacenter-docs-frontend 8080:80
|
|
|
|
# MongoDB (for debugging)
|
|
kubectl port-forward svc/my-datacenter-docs-mongodb 27017:27017
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
#### Pods Stuck in Pending
|
|
|
|
Check if PVCs are bound:
|
|
```bash
|
|
kubectl get pvc
|
|
```
|
|
|
|
If storage class is missing, set it:
|
|
```bash
|
|
helm upgrade my-datacenter-docs ./datacenter-docs \
|
|
--set mongodb.persistence.storageClass="standard" \
|
|
--reuse-values
|
|
```
|
|
|
|
#### API Pods Crash Loop
|
|
|
|
Check logs:
|
|
```bash
|
|
kubectl logs -l app.kubernetes.io/component=api --tail=100
|
|
```
|
|
|
|
Common causes:
|
|
- MongoDB not ready (wait for init containers)
|
|
- Invalid LLM API key
|
|
- Missing environment variables
|
|
|
|
#### Cannot Access via Ingress
|
|
|
|
Check ingress status:
|
|
```bash
|
|
kubectl get ingress
|
|
kubectl describe ingress my-datacenter-docs
|
|
```
|
|
|
|
Ensure:
|
|
- Ingress controller is installed
|
|
- DNS points to ingress IP
|
|
- TLS certificate is valid (if using HTTPS)
|
|
|
|
## Security Considerations
|
|
|
|
### Production Checklist
|
|
|
|
- [ ] Change `secrets.llmApiKey` to a valid API key
|
|
- [ ] Change `secrets.apiSecretKey` to a strong random key
|
|
- [ ] Change MongoDB credentials (`mongodb.auth.rootPassword`)
|
|
- [ ] Enable TLS/SSL on ingress
|
|
- [ ] Review RBAC policies
|
|
- [ ] Use external secret management (e.g., HashiCorp Vault, AWS Secrets Manager)
|
|
- [ ] Enable network policies
|
|
- [ ] Set resource limits on all pods
|
|
- [ ] Enable pod security policies
|
|
- [ ] Review auto-remediation settings
|
|
|
|
### Using External Secrets
|
|
|
|
Instead of storing secrets in values.yaml, use Kubernetes secrets:
|
|
|
|
```bash
|
|
# Create secret
|
|
kubectl create secret generic datacenter-docs-secrets \
|
|
--from-literal=llm-api-key="sk-your-key" \
|
|
--from-literal=api-secret-key="your-secret"
|
|
|
|
# Modify templates to use existing secret
|
|
# (requires chart customization)
|
|
```
|
|
|
|
## Development
|
|
|
|
### Validating the Chart
|
|
|
|
```bash
|
|
# Lint the chart
|
|
helm lint ./datacenter-docs
|
|
|
|
# Dry run
|
|
helm install my-test ./datacenter-docs --dry-run --debug
|
|
|
|
# Template rendering
|
|
helm template my-test ./datacenter-docs > rendered.yaml
|
|
```
|
|
|
|
### Testing Locally
|
|
|
|
```bash
|
|
# Create kind cluster
|
|
kind create cluster
|
|
|
|
# Install chart
|
|
helm install test ./datacenter-docs \
|
|
--set ingress.enabled=false \
|
|
--set api.autoscaling.enabled=false \
|
|
--set mongodb.persistence.enabled=false
|
|
|
|
# Test
|
|
kubectl port-forward svc/test-datacenter-docs-api 8000:8000
|
|
curl http://localhost:8000/health
|
|
```
|
|
|
|
## Support
|
|
|
|
For issues and questions:
|
|
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
|
|
- Documentation: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine
|
|
|
|
## License
|
|
|
|
See the main repository for license information.
|