Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s
11 KiB
11 KiB
Datacenter Docs & Remediation Engine - Helm Chart
Helm chart for deploying the LLM Automation - Docs & Remediation Engine on Kubernetes.
Overview
This chart deploys a complete stack including:
- MongoDB: Document database for storing tickets, documentation, and metadata
- Redis: Cache and task queue backend
- API Service: FastAPI REST API with auto-remediation capabilities
- Chat Service: WebSocket server for real-time documentation queries (optional, not yet implemented)
- Worker Service: Celery workers for background tasks (optional, not yet implemented)
- Frontend: React-based web interface
Prerequisites
- Kubernetes 1.19+
- Helm 3.0+
- PersistentVolume provisioner support in the underlying infrastructure (for MongoDB persistence)
- Ingress controller (optional, for external access)
Installation
Quick Start
# Add the chart repository (if published)
helm repo add datacenter-docs https://your-repo-url
helm repo update
# Install with default values
helm install my-datacenter-docs datacenter-docs/datacenter-docs
# Or install from local directory
helm install my-datacenter-docs ./datacenter-docs
Production Installation
For production, create a custom values.yaml:
# Copy and edit the values file
cp values.yaml my-values.yaml
# Edit my-values.yaml with your configuration
# At minimum, change:
# - secrets.llmApiKey
# - secrets.apiSecretKey
# - ingress.hosts
# Install with custom values
helm install my-datacenter-docs ./datacenter-docs -f my-values.yaml
Install with Specific Configuration
helm install my-datacenter-docs ./datacenter-docs \
--set secrets.llmApiKey="sk-your-openai-api-key" \
--set secrets.apiSecretKey="your-strong-secret-key" \
--set ingress.hosts[0].host="datacenter-docs.yourdomain.com" \
--set mongodb.persistence.size="50Gi"
Configuration
Key Configuration Parameters
Global Settings
| Parameter | Description | Default |
|---|---|---|
global.imagePullPolicy |
Image pull policy | IfNotPresent |
global.storageClass |
Storage class for PVCs | "" |
MongoDB
| Parameter | Description | Default |
|---|---|---|
mongodb.enabled |
Enable MongoDB | true |
mongodb.image.repository |
MongoDB image | mongo |
mongodb.image.tag |
MongoDB version | 7 |
mongodb.auth.rootUsername |
Root username | admin |
mongodb.auth.rootPassword |
Root password | admin123 |
mongodb.persistence.enabled |
Enable persistence | true |
mongodb.persistence.size |
Volume size | 10Gi |
mongodb.resources.requests.memory |
Memory request | 512Mi |
mongodb.resources.limits.memory |
Memory limit | 2Gi |
Redis
| Parameter | Description | Default |
|---|---|---|
redis.enabled |
Enable Redis | true |
redis.image.repository |
Redis image | redis |
redis.image.tag |
Redis version | 7-alpine |
redis.resources.requests.memory |
Memory request | 128Mi |
redis.resources.limits.memory |
Memory limit | 512Mi |
API Service
| Parameter | Description | Default |
|---|---|---|
api.enabled |
Enable API service | true |
api.replicaCount |
Number of replicas | 2 |
api.image.repository |
API image repository | datacenter-docs-api |
api.image.tag |
API image tag | latest |
api.service.port |
Service port | 8000 |
api.autoscaling.enabled |
Enable HPA | true |
api.autoscaling.minReplicas |
Min replicas | 2 |
api.autoscaling.maxReplicas |
Max replicas | 10 |
api.resources.requests.memory |
Memory request | 512Mi |
api.resources.limits.memory |
Memory limit | 2Gi |
Worker Service
| Parameter | Description | Default |
|---|---|---|
worker.enabled |
Enable worker service | false |
worker.replicaCount |
Number of replicas | 3 |
worker.autoscaling.enabled |
Enable HPA | true |
worker.autoscaling.minReplicas |
Min replicas | 1 |
worker.autoscaling.maxReplicas |
Max replicas | 10 |
Chat Service
| Parameter | Description | Default |
|---|---|---|
chat.enabled |
Enable chat service | false |
chat.replicaCount |
Number of replicas | 1 |
chat.service.port |
Service port | 8001 |
Frontend
| Parameter | Description | Default |
|---|---|---|
frontend.enabled |
Enable frontend | true |
frontend.replicaCount |
Number of replicas | 2 |
frontend.service.port |
Service port | 80 |
Ingress
| Parameter | Description | Default |
|---|---|---|
ingress.enabled |
Enable ingress | true |
ingress.className |
Ingress class | nginx |
ingress.hosts[0].host |
Hostname | datacenter-docs.example.com |
ingress.tls[0].secretName |
TLS secret name | datacenter-docs-tls |
Application Configuration
| Parameter | Description | Default |
|---|---|---|
config.llm.baseUrl |
LLM provider URL | https://api.openai.com/v1 |
config.llm.model |
LLM model | gpt-4-turbo-preview |
config.autoRemediation.enabled |
Enable auto-remediation | true |
config.autoRemediation.minReliabilityScore |
Min reliability score | 85.0 |
config.autoRemediation.dryRun |
Dry run mode | false |
config.logLevel |
Log level | INFO |
Secrets
| Parameter | Description | Default |
|---|---|---|
secrets.llmApiKey |
LLM API key | sk-your-openai-api-key-here |
secrets.apiSecretKey |
API secret key | your-secret-key-here-change-in-production |
IMPORTANT: Change these secrets in production!
Usage Examples
Enable All Services (including chat and worker)
helm install my-datacenter-docs ./datacenter-docs \
--set chat.enabled=true \
--set worker.enabled=true
Disable Auto-Remediation
helm install my-datacenter-docs ./datacenter-docs \
--set config.autoRemediation.enabled=false
Use Different LLM Provider (e.g., Anthropic Claude)
helm install my-datacenter-docs ./datacenter-docs \
--set config.llm.baseUrl="https://api.anthropic.com/v1" \
--set config.llm.model="claude-3-opus-20240229" \
--set secrets.llmApiKey="sk-ant-your-anthropic-key"
Use Local LLM (e.g., Ollama)
helm install my-datacenter-docs ./datacenter-docs \
--set config.llm.baseUrl="http://ollama-service:11434/v1" \
--set config.llm.model="llama2" \
--set secrets.llmApiKey="not-needed"
Scale MongoDB Storage
helm install my-datacenter-docs ./datacenter-docs \
--set mongodb.persistence.size="100Gi"
Disable Ingress (use port-forward instead)
helm install my-datacenter-docs ./datacenter-docs \
--set ingress.enabled=false
Production Configuration with External MongoDB
# production-values.yaml
mongodb:
enabled: false
config:
mongodbUrl: "mongodb://user:pass@external-mongodb:27017/datacenter_docs?authSource=admin"
api:
replicaCount: 5
autoscaling:
maxReplicas: 20
secrets:
llmApiKey: "sk-your-production-api-key"
apiSecretKey: "your-production-secret-key"
ingress:
hosts:
- host: "datacenter-docs.prod.yourdomain.com"
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: api
helm install prod-datacenter-docs ./datacenter-docs -f production-values.yaml
Upgrading
# Upgrade with new values
helm upgrade my-datacenter-docs ./datacenter-docs -f my-values.yaml
# Upgrade specific parameters
helm upgrade my-datacenter-docs ./datacenter-docs \
--set api.image.tag="v1.2.0" \
--reuse-values
Uninstallation
helm uninstall my-datacenter-docs
Note: This will delete all resources except PersistentVolumeClaims (PVCs) for MongoDB. To also delete PVCs:
kubectl delete pvc -l app.kubernetes.io/instance=my-datacenter-docs
Monitoring and Troubleshooting
Check Pod Status
kubectl get pods -l app.kubernetes.io/instance=my-datacenter-docs
View Logs
# API logs
kubectl logs -l app.kubernetes.io/component=api -f
# Worker logs
kubectl logs -l app.kubernetes.io/component=worker -f
# MongoDB logs
kubectl logs -l app.kubernetes.io/component=database -f
Access Services Locally
# API
kubectl port-forward svc/my-datacenter-docs-api 8000:8000
# Frontend
kubectl port-forward svc/my-datacenter-docs-frontend 8080:80
# MongoDB (for debugging)
kubectl port-forward svc/my-datacenter-docs-mongodb 27017:27017
Common Issues
Pods Stuck in Pending
Check if PVCs are bound:
kubectl get pvc
If storage class is missing, set it:
helm upgrade my-datacenter-docs ./datacenter-docs \
--set mongodb.persistence.storageClass="standard" \
--reuse-values
API Pods Crash Loop
Check logs:
kubectl logs -l app.kubernetes.io/component=api --tail=100
Common causes:
- MongoDB not ready (wait for init containers)
- Invalid LLM API key
- Missing environment variables
Cannot Access via Ingress
Check ingress status:
kubectl get ingress
kubectl describe ingress my-datacenter-docs
Ensure:
- Ingress controller is installed
- DNS points to ingress IP
- TLS certificate is valid (if using HTTPS)
Security Considerations
Production Checklist
- Change
secrets.llmApiKeyto a valid API key - Change
secrets.apiSecretKeyto a strong random key - Change MongoDB credentials (
mongodb.auth.rootPassword) - Enable TLS/SSL on ingress
- Review RBAC policies
- Use external secret management (e.g., HashiCorp Vault, AWS Secrets Manager)
- Enable network policies
- Set resource limits on all pods
- Enable pod security policies
- Review auto-remediation settings
Using External Secrets
Instead of storing secrets in values.yaml, use Kubernetes secrets:
# Create secret
kubectl create secret generic datacenter-docs-secrets \
--from-literal=llm-api-key="sk-your-key" \
--from-literal=api-secret-key="your-secret"
# Modify templates to use existing secret
# (requires chart customization)
Development
Validating the Chart
# Lint the chart
helm lint ./datacenter-docs
# Dry run
helm install my-test ./datacenter-docs --dry-run --debug
# Template rendering
helm template my-test ./datacenter-docs > rendered.yaml
Testing Locally
# Create kind cluster
kind create cluster
# Install chart
helm install test ./datacenter-docs \
--set ingress.enabled=false \
--set api.autoscaling.enabled=false \
--set mongodb.persistence.enabled=false
# Test
kubectl port-forward svc/test-datacenter-docs-api 8000:8000
curl http://localhost:8000/health
Support
For issues and questions:
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
- Documentation: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine
License
See the main repository for license information.