it-ops/llm-automation-docs-and-remediation-engine

Fork 0

Files

dnviti 2719cfff59

Build / Code Quality Checks (push) Successful in 15m11s

Details

Build / Build & Push Docker Images (worker) (push) Successful in 13m44s

Details

Build / Build & Push Docker Images (frontend) (push) Successful in 5m8s

Details

Build / Build & Push Docker Images (chat) (push) Failing after 30m7s

Details

Build / Build & Push Docker Images (api) (push) Failing after 21m39s

Details

Add Helm chart, Docs, and Config conversion script

2025-10-22 14:35:21 +02:00

9.6 KiB

Raw Blame History

Helm Deployment

This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.

datacenter-docs/ - Main Helm chart for the application
test-chart.sh - Automated testing script for chart validation

Quick Start

Prerequisites

Kubernetes cluster (1.19+)
Helm 3.0+
kubectl configured to access your cluster

Development/Testing Installation

# Install with development settings (minimal resources, local testing)
helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml

# Access the application
kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80

# View API docs: http://localhost:8000/api/docs
# View frontend: http://localhost:8080

Production Installation

# Copy and customize production values
cp datacenter-docs/values-production.yaml my-production-values.yaml

# Edit my-production-values.yaml:
# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
# - Update ingress hosts
# - Adjust resource limits
# - Configure LLM provider
# - Review auto-remediation settings

# Install
helm install prod ./datacenter-docs -f my-production-values.yaml

# Verify deployment
helm list
kubectl get pods
kubectl get ingress

Chart Structure

datacenter-docs/
├── Chart.yaml                      # Chart metadata
├── values.yaml                     # Default configuration
├── values-development.yaml         # Development settings
├── values-production.yaml          # Production example
├── README.md                       # Detailed chart documentation
├── .helmignore                     # Files to exclude from package
└── templates/
    ├── NOTES.txt                   # Post-install instructions
    ├── _helpers.tpl                # Template helpers
    ├── configmap.yaml              # Application configuration
    ├── secrets.yaml                # Sensitive data
    ├── serviceaccount.yaml         # Service account
    ├── mongodb-statefulset.yaml    # MongoDB StatefulSet
    ├── mongodb-service.yaml        # MongoDB Service
    ├── redis-deployment.yaml       # Redis Deployment
    ├── redis-service.yaml          # Redis Service
    ├── api-deployment.yaml         # API Deployment
    ├── api-service.yaml            # API Service
    ├── api-hpa.yaml                # API autoscaling
    ├── chat-deployment.yaml        # Chat Deployment
    ├── chat-service.yaml           # Chat Service
    ├── worker-deployment.yaml      # Worker Deployment
    ├── worker-hpa.yaml             # Worker autoscaling
    ├── frontend-deployment.yaml    # Frontend Deployment
    ├── frontend-service.yaml       # Frontend Service
    └── ingress.yaml                # Ingress configuration

Testing the Chart

Run the automated test script:

cd deploy/helm
./test-chart.sh

This will:

Lint the chart
Render templates with different value files
Perform dry-run installation
Validate Kubernetes manifests
Package the chart

Common Operations

Upgrade Release

# Upgrade with new values
helm upgrade prod ./datacenter-docs -f my-production-values.yaml

# Upgrade with specific parameter changes
helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values

Check Status

# List releases
helm list

# Get release status
helm status prod

# Get current values
helm get values prod

# Get all manifests
helm get manifest prod

Rollback

# View revision history
helm history prod

# Rollback to previous version
helm rollback prod

# Rollback to specific revision
helm rollback prod 2

Uninstall

# Uninstall release
helm uninstall prod

# Also delete PVCs (if using persistent storage)
kubectl delete pvc -l app.kubernetes.io/instance=prod

Configuration Files

values.yaml

Default configuration with reasonable settings for development/testing.

values-development.yaml

Optimized for local development:

Minimal resource requests/limits
Single replicas
Persistence disabled
Dry-run mode for auto-remediation
Debug logging
Ingress disabled (use port-forward)

values-production.yaml

Example production configuration:

Higher resource limits
Multiple replicas
Autoscaling enabled
Persistence enabled with larger volumes
TLS/SSL enabled
Production-grade security settings
All components enabled

Important: Copy and customize this file for your environment. Never use default secrets!

Available Components

Component	Purpose	Default Enabled
MongoDB	Document database	Yes
Redis	Cache & task queue	Yes
API	REST API service	Yes
Chat	WebSocket server	No (not implemented)
Worker	Celery background tasks	No (not implemented)
Frontend	Web UI	Yes

Enable/disable components in your values file:

mongodb:
  enabled: true
redis:
  enabled: true
api:
  enabled: true
chat:
  enabled: false  # Set to true when implemented
worker:
  enabled: false  # Set to true when implemented
frontend:
  enabled: true

Architecture

The chart deploys a complete microservices architecture:

                     ┌─────────────┐
                     │   Ingress   │
                     └──────┬──────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
         ┌────▼────┐   ┌────▼────┐  ┌────▼────┐
         │Frontend │   │   API   │  │  Chat   │
         └─────────┘   └────┬────┘  └────┬────┘
                            │            │
              ┌─────────────┼────────────┘
              │             │
         ┌────▼────┐   ┌────▼────┐
         │  Redis  │   │ MongoDB │
         └─────────┘   └─────────┘
              ▲
              │
         ┌────┴────┐
         │ Worker  │
         └─────────┘

LLM Provider Configuration

The chart supports multiple LLM providers. Configure in your values file:

OpenAI

config:
  llm:
    baseUrl: "https://api.openai.com/v1"
    model: "gpt-4-turbo-preview"
secrets:
  llmApiKey: "sk-your-openai-key"

Anthropic Claude

config:
  llm:
    baseUrl: "https://api.anthropic.com/v1"
    model: "claude-3-opus-20240229"
secrets:
  llmApiKey: "sk-ant-your-anthropic-key"

Local (Ollama)

config:
  llm:
    baseUrl: "http://ollama-service:11434/v1"
    model: "llama2"
secrets:
  llmApiKey: "not-needed"

Azure OpenAI

config:
  llm:
    baseUrl: "https://your-resource.openai.azure.com"
    model: "gpt-4"
secrets:
  llmApiKey: "your-azure-key"

Security Best Practices

For production deployments:

Change all default secrets

helm install prod ./datacenter-docs \
  --set secrets.llmApiKey="your-actual-key" \
  --set secrets.apiSecretKey="$(openssl rand -base64 32)" \
  --set secrets.mongodbPassword="$(openssl rand -base64 32)"

Use external secret management
- HashiCorp Vault
- AWS Secrets Manager
- Azure Key Vault
- Kubernetes External Secrets Operator

Enable TLS/SSL

ingress:
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
  tls:
    - secretName: datacenter-docs-tls
      hosts:
        - datacenter-docs.yourdomain.com

Review auto-remediation settings

config:
  autoRemediation:
    enabled: true
    minReliabilityScore: 95.0  # High threshold for production
    dryRun: true  # Test first, then set to false

Implement network policies
Enable resource quotas
Regular security scanning

Monitoring and Observability

The chart is designed to integrate with:

Prometheus: Metrics collection
Grafana: Visualization
Jaeger: Distributed tracing
ELK/Loki: Log aggregation

Add annotations to enable monitoring:

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8000"
  prometheus.io/path: "/metrics"

Troubleshooting

Pods not starting

# Check pod status
kubectl get pods -l app.kubernetes.io/instance=prod

# Describe pod for events
kubectl describe pod <pod-name>

# View logs
kubectl logs <pod-name> -f

Storage issues

# Check PVC status
kubectl get pvc

# Check storage class
kubectl get storageclass

# Manually create PVC if needed
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
EOF

Ingress not working

# Check ingress status
kubectl get ingress
kubectl describe ingress prod-datacenter-docs

# Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f

Support

For detailed documentation, see:

Chart README: datacenter-docs/README.md
Main project: ../../README.md
Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues

License

See the main repository for license information.

9.6 KiB Raw Blame History

Helm Deployment

Contents

Quick Start

Prerequisites

Development/Testing Installation

Production Installation

Chart Structure

Testing the Chart

Common Operations

Upgrade Release

Check Status

Rollback

Uninstall

Configuration Files

values.yaml

values-development.yaml

values-production.yaml

Available Components

Architecture

LLM Provider Configuration

OpenAI

Anthropic Claude

Local (Ollama)

Azure OpenAI

Security Best Practices

Monitoring and Observability

Troubleshooting

Pods not starting

Storage issues

Ingress not working

Support

License

9.6 KiB

Raw Blame History