it-ops/llm-automation-docs-and-remediation-engine

Fork 0

Files

dnviti 2719cfff59

Build / Code Quality Checks (push) Successful in 15m11s

Details

Build / Build & Push Docker Images (worker) (push) Successful in 13m44s

Details

Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s

Details

Build / Build & Push Docker Images (chat) (push) Failing after 30m7s

Details

Build / Build & Push Docker Images (api) (push) Failing after 21m39s

Details

Add Helm chart, Docs, and Config conversion script

2025-10-22 14:35:21 +02:00

11 KiB

Raw Blame History

Datacenter Docs & Remediation Engine - Helm Chart

Helm chart for deploying the LLM Automation - Docs & Remediation Engine on Kubernetes.

Overview

This chart deploys a complete stack including:

MongoDB: Document database for storing tickets, documentation, and metadata
Redis: Cache and task queue backend
API Service: FastAPI REST API with auto-remediation capabilities
Chat Service: WebSocket server for real-time documentation queries (optional, not yet implemented)
Worker Service: Celery workers for background tasks (optional, not yet implemented)
Frontend: React-based web interface

Prerequisites

Kubernetes 1.19+
Helm 3.0+
PersistentVolume provisioner support in the underlying infrastructure (for MongoDB persistence)
Ingress controller (optional, for external access)

Installation

Quick Start

# Add the chart repository (if published)
helm repo add datacenter-docs https://your-repo-url
helm repo update

# Install with default values
helm install my-datacenter-docs datacenter-docs/datacenter-docs

# Or install from local directory
helm install my-datacenter-docs ./datacenter-docs

Production Installation

For production, create a custom values.yaml:

# Copy and edit the values file
cp values.yaml my-values.yaml

# Edit my-values.yaml with your configuration
# At minimum, change:
# - secrets.llmApiKey
# - secrets.apiSecretKey
# - ingress.hosts

# Install with custom values
helm install my-datacenter-docs ./datacenter-docs -f my-values.yaml

Install with Specific Configuration

helm install my-datacenter-docs ./datacenter-docs \
  --set secrets.llmApiKey="sk-your-openai-api-key" \
  --set secrets.apiSecretKey="your-strong-secret-key" \
  --set ingress.hosts[0].host="datacenter-docs.yourdomain.com" \
  --set mongodb.persistence.size="50Gi"

Configuration

Key Configuration Parameters

Global Settings

Parameter	Description	Default
`global.imagePullPolicy`	Image pull policy	`IfNotPresent`
`global.storageClass`	Storage class for PVCs	`""`

MongoDB

Parameter	Description	Default
`mongodb.enabled`	Enable MongoDB	`true`
`mongodb.image.repository`	MongoDB image	`mongo`
`mongodb.image.tag`	MongoDB version	`7`
`mongodb.auth.rootUsername`	Root username	`admin`
`mongodb.auth.rootPassword`	Root password	`admin123`
`mongodb.persistence.enabled`	Enable persistence	`true`
`mongodb.persistence.size`	Volume size	`10Gi`
`mongodb.resources.requests.memory`	Memory request	`512Mi`
`mongodb.resources.limits.memory`	Memory limit	`2Gi`

Redis

Parameter	Description	Default
`redis.enabled`	Enable Redis	`true`
`redis.image.repository`	Redis image	`redis`
`redis.image.tag`	Redis version	`7-alpine`
`redis.resources.requests.memory`	Memory request	`128Mi`
`redis.resources.limits.memory`	Memory limit	`512Mi`

API Service

Parameter	Description	Default
`api.enabled`	Enable API service	`true`
`api.replicaCount`	Number of replicas	`2`
`api.image.repository`	API image repository	`datacenter-docs-api`
`api.image.tag`	API image tag	`latest`
`api.service.port`	Service port	`8000`
`api.autoscaling.enabled`	Enable HPA	`true`
`api.autoscaling.minReplicas`	Min replicas	`2`
`api.autoscaling.maxReplicas`	Max replicas	`10`
`api.resources.requests.memory`	Memory request	`512Mi`
`api.resources.limits.memory`	Memory limit	`2Gi`

Worker Service

Parameter	Description	Default
`worker.enabled`	Enable worker service	`false`
`worker.replicaCount`	Number of replicas	`3`
`worker.autoscaling.enabled`	Enable HPA	`true`
`worker.autoscaling.minReplicas`	Min replicas	`1`
`worker.autoscaling.maxReplicas`	Max replicas	`10`

Chat Service

Parameter	Description	Default
`chat.enabled`	Enable chat service	`false`
`chat.replicaCount`	Number of replicas	`1`
`chat.service.port`	Service port	`8001`

Frontend

Parameter	Description	Default
`frontend.enabled`	Enable frontend	`true`
`frontend.replicaCount`	Number of replicas	`2`
`frontend.service.port`	Service port	`80`

Ingress

Parameter	Description	Default
`ingress.enabled`	Enable ingress	`true`
`ingress.className`	Ingress class	`nginx`
`ingress.hosts[0].host`	Hostname	`datacenter-docs.example.com`
`ingress.tls[0].secretName`	TLS secret name	`datacenter-docs-tls`

Application Configuration

Parameter	Description	Default
`config.llm.baseUrl`	LLM provider URL	`https://api.openai.com/v1`
`config.llm.model`	LLM model	`gpt-4-turbo-preview`
`config.autoRemediation.enabled`	Enable auto-remediation	`true`
`config.autoRemediation.minReliabilityScore`	Min reliability score	`85.0`
`config.autoRemediation.dryRun`	Dry run mode	`false`
`config.logLevel`	Log level	`INFO`

Secrets

Parameter	Description	Default
`secrets.llmApiKey`	LLM API key	`sk-your-openai-api-key-here`
`secrets.apiSecretKey`	API secret key	`your-secret-key-here-change-in-production`

IMPORTANT: Change these secrets in production!

Usage Examples

Enable All Services (including chat and worker)

helm install my-datacenter-docs ./datacenter-docs \
  --set chat.enabled=true \
  --set worker.enabled=true

Disable Auto-Remediation

helm install my-datacenter-docs ./datacenter-docs \
  --set config.autoRemediation.enabled=false

Use Different LLM Provider (e.g., Anthropic Claude)

helm install my-datacenter-docs ./datacenter-docs \
  --set config.llm.baseUrl="https://api.anthropic.com/v1" \
  --set config.llm.model="claude-3-opus-20240229" \
  --set secrets.llmApiKey="sk-ant-your-anthropic-key"

Use Local LLM (e.g., Ollama)

helm install my-datacenter-docs ./datacenter-docs \
  --set config.llm.baseUrl="http://ollama-service:11434/v1" \
  --set config.llm.model="llama2" \
  --set secrets.llmApiKey="not-needed"

Scale MongoDB Storage

helm install my-datacenter-docs ./datacenter-docs \
  --set mongodb.persistence.size="100Gi"

Disable Ingress (use port-forward instead)

helm install my-datacenter-docs ./datacenter-docs \
  --set ingress.enabled=false

Production Configuration with External MongoDB

# production-values.yaml
mongodb:
  enabled: false

config:
  mongodbUrl: "mongodb://user:pass@external-mongodb:27017/datacenter_docs?authSource=admin"

api:
  replicaCount: 5
  autoscaling:
    maxReplicas: 20

secrets:
  llmApiKey: "sk-your-production-api-key"
  apiSecretKey: "your-production-secret-key"

ingress:
  hosts:
    - host: "datacenter-docs.prod.yourdomain.com"
      paths:
        - path: /
          pathType: Prefix
          service: frontend
        - path: /api
          pathType: Prefix
          service: api

helm install prod-datacenter-docs ./datacenter-docs -f production-values.yaml

Upgrading

# Upgrade with new values
helm upgrade my-datacenter-docs ./datacenter-docs -f my-values.yaml

# Upgrade specific parameters
helm upgrade my-datacenter-docs ./datacenter-docs \
  --set api.image.tag="v1.2.0" \
  --reuse-values

Uninstallation

helm uninstall my-datacenter-docs

Note: This will delete all resources except PersistentVolumeClaims (PVCs) for MongoDB. To also delete PVCs:

kubectl delete pvc -l app.kubernetes.io/instance=my-datacenter-docs

Monitoring and Troubleshooting

Check Pod Status

kubectl get pods -l app.kubernetes.io/instance=my-datacenter-docs

View Logs

# API logs
kubectl logs -l app.kubernetes.io/component=api -f

# Worker logs
kubectl logs -l app.kubernetes.io/component=worker -f

# MongoDB logs
kubectl logs -l app.kubernetes.io/component=database -f

Access Services Locally

# API
kubectl port-forward svc/my-datacenter-docs-api 8000:8000

# Frontend
kubectl port-forward svc/my-datacenter-docs-frontend 8080:80

# MongoDB (for debugging)
kubectl port-forward svc/my-datacenter-docs-mongodb 27017:27017

Common Issues

Pods Stuck in Pending

Check if PVCs are bound:

kubectl get pvc

If storage class is missing, set it:

helm upgrade my-datacenter-docs ./datacenter-docs \
  --set mongodb.persistence.storageClass="standard" \
  --reuse-values

API Pods Crash Loop

Check logs:

kubectl logs -l app.kubernetes.io/component=api --tail=100

Common causes:

MongoDB not ready (wait for init containers)
Invalid LLM API key
Missing environment variables

Cannot Access via Ingress

Check ingress status:

kubectl get ingress
kubectl describe ingress my-datacenter-docs

Ensure:

Ingress controller is installed
DNS points to ingress IP
TLS certificate is valid (if using HTTPS)

Security Considerations

Production Checklist

Change secrets.llmApiKey to a valid API key
Change secrets.apiSecretKey to a strong random key
Change MongoDB credentials (mongodb.auth.rootPassword)
Enable TLS/SSL on ingress
Review RBAC policies
Use external secret management (e.g., HashiCorp Vault, AWS Secrets Manager)
Enable network policies
Set resource limits on all pods
Enable pod security policies
Review auto-remediation settings

Using External Secrets

Instead of storing secrets in values.yaml, use Kubernetes secrets:

# Create secret
kubectl create secret generic datacenter-docs-secrets \
  --from-literal=llm-api-key="sk-your-key" \
  --from-literal=api-secret-key="your-secret"

# Modify templates to use existing secret
# (requires chart customization)

Development

Validating the Chart

# Lint the chart
helm lint ./datacenter-docs

# Dry run
helm install my-test ./datacenter-docs --dry-run --debug

# Template rendering
helm template my-test ./datacenter-docs > rendered.yaml

Testing Locally

# Create kind cluster
kind create cluster

# Install chart
helm install test ./datacenter-docs \
  --set ingress.enabled=false \
  --set api.autoscaling.enabled=false \
  --set mongodb.persistence.enabled=false

# Test
kubectl port-forward svc/test-datacenter-docs-api 8000:8000
curl http://localhost:8000/health

Support

For issues and questions:

License

See the main repository for license information.

11 KiB Raw Blame History

Datacenter Docs & Remediation Engine - Helm Chart

Overview

Prerequisites

Installation

Quick Start

Production Installation

Install with Specific Configuration

Configuration

Key Configuration Parameters

Global Settings

MongoDB

Redis

API Service

Worker Service

Chat Service

Frontend

Ingress

Application Configuration

Secrets

Usage Examples

Enable All Services (including chat and worker)

Disable Auto-Remediation

Use Different LLM Provider (e.g., Anthropic Claude)

Use Local LLM (e.g., Ollama)

Scale MongoDB Storage

Disable Ingress (use port-forward instead)

Production Configuration with External MongoDB

Upgrading

Uninstallation

Monitoring and Troubleshooting

Check Pod Status

View Logs

Access Services Locally

Common Issues

Pods Stuck in Pending

API Pods Crash Loop

Cannot Access via Ingress

Security Considerations

Production Checklist

Using External Secrets

Development

Validating the Chart

Testing Locally

Support

License

11 KiB

Raw Blame History