llm-automation-docs-and-rem…/deploy/helm/datacenter-docs/README.md

# Datacenter Docs & Remediation Engine - Helm Chart

Helm chart for deploying the LLM Automation - Docs & Remediation Engine on Kubernetes.

## Overview

This chart deploys a complete stack including:
- **MongoDB**: Document database for storing tickets, documentation, and metadata
- **Redis**: Cache and task queue backend
- **API Service**: FastAPI REST API with auto-remediation capabilities
- **Chat Service**: WebSocket server for real-time documentation queries (optional, not yet implemented)
- **Worker Service**: Celery workers for background tasks (optional, not yet implemented)
- **Frontend**: React-based web interface

## Prerequisites

- Kubernetes 1.19+
- Helm 3.0+
- PersistentVolume provisioner support in the underlying infrastructure (for MongoDB persistence)
- Ingress controller (optional, for external access)

## Installation

### Quick Start

```bash
# Add the chart repository (if published)
helm repo add datacenter-docs https://your-repo-url
helm repo update

# Install with default values
helm install my-datacenter-docs datacenter-docs/datacenter-docs

# Or install from local directory
helm install my-datacenter-docs ./datacenter-docs
```

### Production Installation

For production, create a custom `values.yaml`:

```bash
# Copy and edit the values file
cp values.yaml my-values.yaml

# Edit my-values.yaml with your configuration
# At minimum, change:
# - secrets.llmApiKey
# - secrets.apiSecretKey
# - ingress.hosts

# Install with custom values
helm install my-datacenter-docs ./datacenter-docs -f my-values.yaml
```

### Install with Specific Configuration

```bash
helm install my-datacenter-docs ./datacenter-docs \
  --set secrets.llmApiKey="sk-your-openai-api-key" \
  --set secrets.apiSecretKey="your-strong-secret-key" \
  --set ingress.hosts[0].host="datacenter-docs.yourdomain.com" \
  --set mongodb.persistence.size="50Gi"
```

## Configuration

### Key Configuration Parameters

#### Global Settings

| Parameter | Description | Default |
|-----------|-------------|---------|
| `global.imagePullPolicy` | Image pull policy | `IfNotPresent` |
| `global.storageClass` | Storage class for PVCs | `""` |

#### MongoDB

| Parameter | Description | Default |
|-----------|-------------|---------|
| `mongodb.enabled` | Enable MongoDB | `true` |
| `mongodb.image.repository` | MongoDB image | `mongo` |
| `mongodb.image.tag` | MongoDB version | `7` |
| `mongodb.auth.rootUsername` | Root username | `admin` |
| `mongodb.auth.rootPassword` | Root password | `admin123` |
| `mongodb.persistence.enabled` | Enable persistence | `true` |
| `mongodb.persistence.size` | Volume size | `10Gi` |
| `mongodb.resources.requests.memory` | Memory request | `512Mi` |
| `mongodb.resources.limits.memory` | Memory limit | `2Gi` |

#### Redis

| Parameter | Description | Default |
|-----------|-------------|---------|
| `redis.enabled` | Enable Redis | `true` |
| `redis.image.repository` | Redis image | `redis` |
| `redis.image.tag` | Redis version | `7-alpine` |
| `redis.resources.requests.memory` | Memory request | `128Mi` |
| `redis.resources.limits.memory` | Memory limit | `512Mi` |

#### API Service

| Parameter | Description | Default |
|-----------|-------------|---------|
| `api.enabled` | Enable API service | `true` |
| `api.replicaCount` | Number of replicas | `2` |
| `api.image.repository` | API image repository | `datacenter-docs-api` |
| `api.image.tag` | API image tag | `latest` |
| `api.service.port` | Service port | `8000` |
| `api.autoscaling.enabled` | Enable HPA | `true` |
| `api.autoscaling.minReplicas` | Min replicas | `2` |
| `api.autoscaling.maxReplicas` | Max replicas | `10` |
| `api.resources.requests.memory` | Memory request | `512Mi` |
| `api.resources.limits.memory` | Memory limit | `2Gi` |

#### Worker Service

| Parameter | Description | Default |
|-----------|-------------|---------|
| `worker.enabled` | Enable worker service | `false` |
| `worker.replicaCount` | Number of replicas | `3` |
| `worker.autoscaling.enabled` | Enable HPA | `true` |
| `worker.autoscaling.minReplicas` | Min replicas | `1` |
| `worker.autoscaling.maxReplicas` | Max replicas | `10` |

#### Chat Service

| Parameter | Description | Default |
|-----------|-------------|---------|
| `chat.enabled` | Enable chat service | `false` |
| `chat.replicaCount` | Number of replicas | `1` |
| `chat.service.port` | Service port | `8001` |

#### Frontend

| Parameter | Description | Default |
|-----------|-------------|---------|
| `frontend.enabled` | Enable frontend | `true` |
| `frontend.replicaCount` | Number of replicas | `2` |
| `frontend.service.port` | Service port | `80` |

#### Ingress

| Parameter | Description | Default |
|-----------|-------------|---------|
| `ingress.enabled` | Enable ingress | `true` |
| `ingress.className` | Ingress class | `nginx` |
| `ingress.hosts[0].host` | Hostname | `datacenter-docs.example.com` |
| `ingress.tls[0].secretName` | TLS secret name | `datacenter-docs-tls` |

#### Application Configuration

| Parameter | Description | Default |
|-----------|-------------|---------|
| `config.llm.baseUrl` | LLM provider URL | `https://api.openai.com/v1` |
| `config.llm.model` | LLM model | `gpt-4-turbo-preview` |
| `config.autoRemediation.enabled` | Enable auto-remediation | `true` |
| `config.autoRemediation.minReliabilityScore` | Min reliability score | `85.0` |
| `config.autoRemediation.dryRun` | Dry run mode | `false` |
| `config.logLevel` | Log level | `INFO` |

#### Secrets

| Parameter | Description | Default |
|-----------|-------------|---------|
| `secrets.llmApiKey` | LLM API key | `sk-your-openai-api-key-here` |
| `secrets.apiSecretKey` | API secret key | `your-secret-key-here-change-in-production` |

**IMPORTANT**: Change these secrets in production!

## Usage Examples

### Enable All Services (including chat and worker)

```bash
helm install my-datacenter-docs ./datacenter-docs \
  --set chat.enabled=true \
  --set worker.enabled=true
```

### Disable Auto-Remediation

```bash
helm install my-datacenter-docs ./datacenter-docs \
  --set config.autoRemediation.enabled=false
```

### Use Different LLM Provider (e.g., Anthropic Claude)

```bash
helm install my-datacenter-docs ./datacenter-docs \
  --set config.llm.baseUrl="https://api.anthropic.com/v1" \
  --set config.llm.model="claude-3-opus-20240229" \
  --set secrets.llmApiKey="sk-ant-your-anthropic-key"
```

### Use Local LLM (e.g., Ollama)

```bash
helm install my-datacenter-docs ./datacenter-docs \
  --set config.llm.baseUrl="http://ollama-service:11434/v1" \
  --set config.llm.model="llama2" \
  --set secrets.llmApiKey="not-needed"
```

### Scale MongoDB Storage

```bash
helm install my-datacenter-docs ./datacenter-docs \
  --set mongodb.persistence.size="100Gi"
```

### Disable Ingress (use port-forward instead)

```bash
helm install my-datacenter-docs ./datacenter-docs \
  --set ingress.enabled=false
```

### Production Configuration with External MongoDB

```yaml
# production-values.yaml
mongodb:
  enabled: false

config:
  mongodbUrl: "mongodb://user:pass@external-mongodb:27017/datacenter_docs?authSource=admin"

api:
  replicaCount: 5
  autoscaling:
    maxReplicas: 20

secrets:
  llmApiKey: "sk-your-production-api-key"
  apiSecretKey: "your-production-secret-key"

ingress:
  hosts:
    - host: "datacenter-docs.prod.yourdomain.com"
      paths:
        - path: /
          pathType: Prefix
          service: frontend
        - path: /api
          pathType: Prefix
          service: api
```

```bash
helm install prod-datacenter-docs ./datacenter-docs -f production-values.yaml
```

## Upgrading

```bash
# Upgrade with new values
helm upgrade my-datacenter-docs ./datacenter-docs -f my-values.yaml

# Upgrade specific parameters
helm upgrade my-datacenter-docs ./datacenter-docs \
  --set api.image.tag="v1.2.0" \
  --reuse-values
```

## Uninstallation

```bash
helm uninstall my-datacenter-docs
```

**Note**: This will delete all resources except PersistentVolumeClaims (PVCs) for MongoDB. To also delete PVCs:

```bash
kubectl delete pvc -l app.kubernetes.io/instance=my-datacenter-docs
```

## Monitoring and Troubleshooting

### Check Pod Status

```bash
kubectl get pods -l app.kubernetes.io/instance=my-datacenter-docs
```

### View Logs

```bash
# API logs
kubectl logs -l app.kubernetes.io/component=api -f

# Worker logs
kubectl logs -l app.kubernetes.io/component=worker -f

# MongoDB logs
kubectl logs -l app.kubernetes.io/component=database -f
```

### Access Services Locally

```bash
# API
kubectl port-forward svc/my-datacenter-docs-api 8000:8000

# Frontend
kubectl port-forward svc/my-datacenter-docs-frontend 8080:80

# MongoDB (for debugging)
kubectl port-forward svc/my-datacenter-docs-mongodb 27017:27017
```

### Common Issues

#### Pods Stuck in Pending

Check if PVCs are bound:
```bash
kubectl get pvc
```

If storage class is missing, set it:
```bash
helm upgrade my-datacenter-docs ./datacenter-docs \
  --set mongodb.persistence.storageClass="standard" \
  --reuse-values
```

#### API Pods Crash Loop

Check logs:
```bash
kubectl logs -l app.kubernetes.io/component=api --tail=100
```

Common causes:
- MongoDB not ready (wait for init containers)
- Invalid LLM API key
- Missing environment variables

#### Cannot Access via Ingress

Check ingress status:
```bash
kubectl get ingress
kubectl describe ingress my-datacenter-docs
```

Ensure:
- Ingress controller is installed
- DNS points to ingress IP
- TLS certificate is valid (if using HTTPS)

## Security Considerations

### Production Checklist

- [ ] Change `secrets.llmApiKey` to a valid API key
- [ ] Change `secrets.apiSecretKey` to a strong random key
- [ ] Change MongoDB credentials (`mongodb.auth.rootPassword`)
- [ ] Enable TLS/SSL on ingress
- [ ] Review RBAC policies
- [ ] Use external secret management (e.g., HashiCorp Vault, AWS Secrets Manager)
- [ ] Enable network policies
- [ ] Set resource limits on all pods
- [ ] Enable pod security policies
- [ ] Review auto-remediation settings

### Using External Secrets

Instead of storing secrets in values.yaml, use Kubernetes secrets:

```bash
# Create secret
kubectl create secret generic datacenter-docs-secrets \
  --from-literal=llm-api-key="sk-your-key" \
  --from-literal=api-secret-key="your-secret"

# Modify templates to use existing secret
# (requires chart customization)
```

## Development

### Validating the Chart

```bash
# Lint the chart
helm lint ./datacenter-docs

# Dry run
helm install my-test ./datacenter-docs --dry-run --debug

# Template rendering
helm template my-test ./datacenter-docs > rendered.yaml
```

### Testing Locally

```bash
# Create kind cluster
kind create cluster

# Install chart
helm install test ./datacenter-docs \
  --set ingress.enabled=false \
  --set api.autoscaling.enabled=false \
  --set mongodb.persistence.enabled=false

# Test
kubectl port-forward svc/test-datacenter-docs-api 8000:8000
curl http://localhost:8000/health
```

## Support

For issues and questions:
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
- Documentation: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine

## License

See the main repository for license information.