Files
llm-automation-docs-and-rem…/deploy/helm/README.md
dnviti 2719cfff59
Some checks failed
Build / Code Quality Checks (push) Successful in 15m11s
Build / Build & Push Docker Images (worker) (push) Successful in 13m44s
Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s
Build / Build & Push Docker Images (chat) (push) Failing after 30m7s
Build / Build & Push Docker Images (api) (push) Failing after 21m39s
Add Helm chart, Docs, and Config conversion script
2025-10-22 14:35:21 +02:00

9.6 KiB

Helm Deployment

This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.

Contents

  • datacenter-docs/ - Main Helm chart for the application
  • test-chart.sh - Automated testing script for chart validation

Quick Start

Prerequisites

  • Kubernetes cluster (1.19+)
  • Helm 3.0+
  • kubectl configured to access your cluster

Development/Testing Installation

# Install with development settings (minimal resources, local testing)
helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml

# Access the application
kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80

# View API docs: http://localhost:8000/api/docs
# View frontend: http://localhost:8080

Production Installation

# Copy and customize production values
cp datacenter-docs/values-production.yaml my-production-values.yaml

# Edit my-production-values.yaml:
# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
# - Update ingress hosts
# - Adjust resource limits
# - Configure LLM provider
# - Review auto-remediation settings

# Install
helm install prod ./datacenter-docs -f my-production-values.yaml

# Verify deployment
helm list
kubectl get pods
kubectl get ingress

Chart Structure

datacenter-docs/
├── Chart.yaml                      # Chart metadata
├── values.yaml                     # Default configuration
├── values-development.yaml         # Development settings
├── values-production.yaml          # Production example
├── README.md                       # Detailed chart documentation
├── .helmignore                     # Files to exclude from package
└── templates/
    ├── NOTES.txt                   # Post-install instructions
    ├── _helpers.tpl                # Template helpers
    ├── configmap.yaml              # Application configuration
    ├── secrets.yaml                # Sensitive data
    ├── serviceaccount.yaml         # Service account
    ├── mongodb-statefulset.yaml    # MongoDB StatefulSet
    ├── mongodb-service.yaml        # MongoDB Service
    ├── redis-deployment.yaml       # Redis Deployment
    ├── redis-service.yaml          # Redis Service
    ├── api-deployment.yaml         # API Deployment
    ├── api-service.yaml            # API Service
    ├── api-hpa.yaml                # API autoscaling
    ├── chat-deployment.yaml        # Chat Deployment
    ├── chat-service.yaml           # Chat Service
    ├── worker-deployment.yaml      # Worker Deployment
    ├── worker-hpa.yaml             # Worker autoscaling
    ├── frontend-deployment.yaml    # Frontend Deployment
    ├── frontend-service.yaml       # Frontend Service
    └── ingress.yaml                # Ingress configuration

Testing the Chart

Run the automated test script:

cd deploy/helm
./test-chart.sh

This will:

  1. Lint the chart
  2. Render templates with different value files
  3. Perform dry-run installation
  4. Validate Kubernetes manifests
  5. Package the chart

Common Operations

Upgrade Release

# Upgrade with new values
helm upgrade prod ./datacenter-docs -f my-production-values.yaml

# Upgrade with specific parameter changes
helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values

Check Status

# List releases
helm list

# Get release status
helm status prod

# Get current values
helm get values prod

# Get all manifests
helm get manifest prod

Rollback

# View revision history
helm history prod

# Rollback to previous version
helm rollback prod

# Rollback to specific revision
helm rollback prod 2

Uninstall

# Uninstall release
helm uninstall prod

# Also delete PVCs (if using persistent storage)
kubectl delete pvc -l app.kubernetes.io/instance=prod

Configuration Files

values.yaml

Default configuration with reasonable settings for development/testing.

values-development.yaml

Optimized for local development:

  • Minimal resource requests/limits
  • Single replicas
  • Persistence disabled
  • Dry-run mode for auto-remediation
  • Debug logging
  • Ingress disabled (use port-forward)

values-production.yaml

Example production configuration:

  • Higher resource limits
  • Multiple replicas
  • Autoscaling enabled
  • Persistence enabled with larger volumes
  • TLS/SSL enabled
  • Production-grade security settings
  • All components enabled

Important: Copy and customize this file for your environment. Never use default secrets!

Available Components

Component Purpose Default Enabled
MongoDB Document database Yes
Redis Cache & task queue Yes
API REST API service Yes
Chat WebSocket server No (not implemented)
Worker Celery background tasks No (not implemented)
Frontend Web UI Yes

Enable/disable components in your values file:

mongodb:
  enabled: true
redis:
  enabled: true
api:
  enabled: true
chat:
  enabled: false  # Set to true when implemented
worker:
  enabled: false  # Set to true when implemented
frontend:
  enabled: true

Architecture

The chart deploys a complete microservices architecture:

                     ┌─────────────┐
                     │   Ingress   │
                     └──────┬──────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
         ┌────▼────┐   ┌────▼────┐  ┌────▼────┐
         │Frontend │   │   API   │  │  Chat   │
         └─────────┘   └────┬────┘  └────┬────┘
                            │            │
              ┌─────────────┼────────────┘
              │             │
         ┌────▼────┐   ┌────▼────┐
         │  Redis  │   │ MongoDB │
         └─────────┘   └─────────┘
              ▲
              │
         ┌────┴────┐
         │ Worker  │
         └─────────┘

LLM Provider Configuration

The chart supports multiple LLM providers. Configure in your values file:

OpenAI

config:
  llm:
    baseUrl: "https://api.openai.com/v1"
    model: "gpt-4-turbo-preview"
secrets:
  llmApiKey: "sk-your-openai-key"

Anthropic Claude

config:
  llm:
    baseUrl: "https://api.anthropic.com/v1"
    model: "claude-3-opus-20240229"
secrets:
  llmApiKey: "sk-ant-your-anthropic-key"

Local (Ollama)

config:
  llm:
    baseUrl: "http://ollama-service:11434/v1"
    model: "llama2"
secrets:
  llmApiKey: "not-needed"

Azure OpenAI

config:
  llm:
    baseUrl: "https://your-resource.openai.azure.com"
    model: "gpt-4"
secrets:
  llmApiKey: "your-azure-key"

Security Best Practices

For production deployments:

  1. Change all default secrets

    helm install prod ./datacenter-docs \
      --set secrets.llmApiKey="your-actual-key" \
      --set secrets.apiSecretKey="$(openssl rand -base64 32)" \
      --set secrets.mongodbPassword="$(openssl rand -base64 32)"
    
  2. Use external secret management

    • HashiCorp Vault
    • AWS Secrets Manager
    • Azure Key Vault
    • Kubernetes External Secrets Operator
  3. Enable TLS/SSL

    ingress:
      annotations:
        cert-manager.io/cluster-issuer: "letsencrypt-prod"
      tls:
        - secretName: datacenter-docs-tls
          hosts:
            - datacenter-docs.yourdomain.com
    
  4. Review auto-remediation settings

    config:
      autoRemediation:
        enabled: true
        minReliabilityScore: 95.0  # High threshold for production
        dryRun: true  # Test first, then set to false
    
  5. Implement network policies

  6. Enable resource quotas

  7. Regular security scanning

Monitoring and Observability

The chart is designed to integrate with:

  • Prometheus: Metrics collection
  • Grafana: Visualization
  • Jaeger: Distributed tracing
  • ELK/Loki: Log aggregation

Add annotations to enable monitoring:

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8000"
  prometheus.io/path: "/metrics"

Troubleshooting

Pods not starting

# Check pod status
kubectl get pods -l app.kubernetes.io/instance=prod

# Describe pod for events
kubectl describe pod <pod-name>

# View logs
kubectl logs <pod-name> -f

Storage issues

# Check PVC status
kubectl get pvc

# Check storage class
kubectl get storageclass

# Manually create PVC if needed
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
EOF

Ingress not working

# Check ingress status
kubectl get ingress
kubectl describe ingress prod-datacenter-docs

# Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f

Support

For detailed documentation, see:

License

See the main repository for license information.