# Helm Deployment

This directory contains Helm charts for deploying the Datacenter Docs & Remediation Engine on Kubernetes.

## Contents

- `datacenter-docs/` - Main Helm chart for the application
- `test-chart.sh` - Automated testing script for chart validation

## Quick Start

### Prerequisites

- Kubernetes cluster (1.19+)
- Helm 3.0+
- kubectl configured to access your cluster

### Development/Testing Installation

```bash
# Install with development settings (minimal resources, local testing)
helm install dev ./datacenter-docs -f ./datacenter-docs/values-development.yaml

# Access the application
kubectl port-forward svc/dev-datacenter-docs-api 8000:8000
kubectl port-forward svc/dev-datacenter-docs-frontend 8080:80

# View API docs: http://localhost:8000/api/docs
# View frontend: http://localhost:8080
```

### Production Installation

```bash
# Copy and customize production values
cp datacenter-docs/values-production.yaml my-production-values.yaml

# Edit my-production-values.yaml:
# - Change all secrets (llmApiKey, apiSecretKey, mongodbPassword)
# - Update ingress hosts
# - Adjust resource limits
# - Configure LLM provider
# - Review auto-remediation settings

# Install
helm install prod ./datacenter-docs -f my-production-values.yaml

# Verify deployment
helm list
kubectl get pods
kubectl get ingress
```

## Chart Structure

```
datacenter-docs/
├── Chart.yaml                      # Chart metadata
├── values.yaml                     # Default configuration
├── values-development.yaml         # Development settings
├── values-production.yaml          # Production example
├── README.md                       # Detailed chart documentation
├── .helmignore                     # Files to exclude from package
└── templates/
    ├── NOTES.txt                   # Post-install instructions
    ├── _helpers.tpl                # Template helpers
    ├── configmap.yaml              # Application configuration
    ├── secrets.yaml                # Sensitive data
    ├── serviceaccount.yaml         # Service account
    ├── mongodb-statefulset.yaml    # MongoDB StatefulSet
    ├── mongodb-service.yaml        # MongoDB Service
    ├── redis-deployment.yaml       # Redis Deployment
    ├── redis-service.yaml          # Redis Service
    ├── api-deployment.yaml         # API Deployment
    ├── api-service.yaml            # API Service
    ├── api-hpa.yaml                # API autoscaling
    ├── chat-deployment.yaml        # Chat Deployment
    ├── chat-service.yaml           # Chat Service
    ├── worker-deployment.yaml      # Worker Deployment
    ├── worker-hpa.yaml             # Worker autoscaling
    ├── frontend-deployment.yaml    # Frontend Deployment
    ├── frontend-service.yaml       # Frontend Service
    └── ingress.yaml                # Ingress configuration
```

## Testing the Chart

Run the automated test script:

```bash
cd deploy/helm
./test-chart.sh
```

This will:
1. Lint the chart
2. Render templates with different value files
3. Perform dry-run installation
4. Validate Kubernetes manifests
5. Package the chart

## Common Operations

### Upgrade Release

```bash
# Upgrade with new values
helm upgrade prod ./datacenter-docs -f my-production-values.yaml

# Upgrade with specific parameter changes
helm upgrade prod ./datacenter-docs --set api.replicaCount=10 --reuse-values
```

### Check Status

```bash
# List releases
helm list

# Get release status
helm status prod

# Get current values
helm get values prod

# Get all manifests
helm get manifest prod
```

### Rollback

```bash
# View revision history
helm history prod

# Rollback to previous version
helm rollback prod

# Rollback to specific revision
helm rollback prod 2
```

### Uninstall

```bash
# Uninstall release
helm uninstall prod

# Also delete PVCs (if using persistent storage)
kubectl delete pvc -l app.kubernetes.io/instance=prod
```

## Configuration Files

### values.yaml
Default configuration with reasonable settings for development/testing.

### values-development.yaml
Optimized for local development:
- Minimal resource requests/limits
- Single replicas
- Persistence disabled
- Dry-run mode for auto-remediation
- Debug logging
- Ingress disabled (use port-forward)

### values-production.yaml
Example production configuration:
- Higher resource limits
- Multiple replicas
- Autoscaling enabled
- Persistence enabled with larger volumes
- TLS/SSL enabled
- Production-grade security settings
- All components enabled

**Important**: Copy and customize this file for your environment. Never use default secrets!

## Available Components

| Component | Purpose | Default Enabled |
|-----------|---------|-----------------|
| MongoDB | Document database | Yes |
| Redis | Cache & task queue | Yes |
| API | REST API service | Yes |
| Chat | WebSocket server | No (not implemented) |
| Worker | Celery background tasks | No (not implemented) |
| Frontend | Web UI | Yes |

Enable/disable components in your values file:

```yaml
mongodb:
  enabled: true
redis:
  enabled: true
api:
  enabled: true
chat:
  enabled: false  # Set to true when implemented
worker:
  enabled: false  # Set to true when implemented
frontend:
  enabled: true
```

## Architecture

The chart deploys a complete microservices architecture:

```
                     ┌─────────────┐
                     │   Ingress   │
                     └──────┬──────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
         ┌────▼────┐   ┌────▼────┐  ┌────▼────┐
         │Frontend │   │   API   │  │  Chat   │
         └─────────┘   └────┬────┘  └────┬────┘
                            │            │
              ┌─────────────┼────────────┘
              │             │
         ┌────▼────┐   ┌────▼────┐
         │  Redis  │   │ MongoDB │
         └─────────┘   └─────────┘
              ▲
              │
         ┌────┴────┐
         │ Worker  │
         └─────────┘
```

## LLM Provider Configuration

The chart supports multiple LLM providers. Configure in your values file:

### OpenAI

```yaml
config:
  llm:
    baseUrl: "https://api.openai.com/v1"
    model: "gpt-4-turbo-preview"
secrets:
  llmApiKey: "sk-your-openai-key"
```

### Anthropic Claude

```yaml
config:
  llm:
    baseUrl: "https://api.anthropic.com/v1"
    model: "claude-3-opus-20240229"
secrets:
  llmApiKey: "sk-ant-your-anthropic-key"
```

### Local (Ollama)

```yaml
config:
  llm:
    baseUrl: "http://ollama-service:11434/v1"
    model: "llama2"
secrets:
  llmApiKey: "not-needed"
```

### Azure OpenAI

```yaml
config:
  llm:
    baseUrl: "https://your-resource.openai.azure.com"
    model: "gpt-4"
secrets:
  llmApiKey: "your-azure-key"
```

## Security Best Practices

For production deployments:

1. **Change all default secrets**
   ```bash
   helm install prod ./datacenter-docs \
     --set secrets.llmApiKey="your-actual-key" \
     --set secrets.apiSecretKey="$(openssl rand -base64 32)" \
     --set secrets.mongodbPassword="$(openssl rand -base64 32)"
   ```

2. **Use external secret management**
   - HashiCorp Vault
   - AWS Secrets Manager
   - Azure Key Vault
   - Kubernetes External Secrets Operator

3. **Enable TLS/SSL**
   ```yaml
   ingress:
     annotations:
       cert-manager.io/cluster-issuer: "letsencrypt-prod"
     tls:
       - secretName: datacenter-docs-tls
         hosts:
           - datacenter-docs.yourdomain.com
   ```

4. **Review auto-remediation settings**
   ```yaml
   config:
     autoRemediation:
       enabled: true
       minReliabilityScore: 95.0  # High threshold for production
       dryRun: true  # Test first, then set to false
   ```

5. **Implement network policies**
6. **Enable resource quotas**
7. **Regular security scanning**

## Monitoring and Observability

The chart is designed to integrate with:
- **Prometheus**: Metrics collection
- **Grafana**: Visualization
- **Jaeger**: Distributed tracing
- **ELK/Loki**: Log aggregation

Add annotations to enable monitoring:

```yaml
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8000"
  prometheus.io/path: "/metrics"
```

## Troubleshooting

### Pods not starting

```bash
# Check pod status
kubectl get pods -l app.kubernetes.io/instance=prod

# Describe pod for events
kubectl describe pod <pod-name>

# View logs
kubectl logs <pod-name> -f
```

### Storage issues

```bash
# Check PVC status
kubectl get pvc

# Check storage class
kubectl get storageclass

# Manually create PVC if needed
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
EOF
```

### Ingress not working

```bash
# Check ingress status
kubectl get ingress
kubectl describe ingress prod-datacenter-docs

# Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f
```

## Support

For detailed documentation, see:
- Chart README: `datacenter-docs/README.md`
- Main project: `../../README.md`
- Issues: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues

## License

See the main repository for license information.