llm-automation-docs-and-rem…/CONFIGURATION.md

# Configuration Guide

This guide explains how to configure the Datacenter Documentation & Remediation Engine using the various configuration files available.

## Configuration Files Overview

The project supports multiple configuration methods to suit different deployment scenarios:

### 1. `.env` File (Docker Compose)
- **Location**: Root of the project
- **Format**: Environment variables
- **Use case**: Local development, Docker Compose deployments
- **Template**: `.env.example`

### 2. `values.yaml` File (Structured Configuration)
- **Location**: Root of the project
- **Format**: YAML
- **Use case**: General configuration, Helm deployments, configuration management
- **Template**: `values.yaml`

### 3. Helm Chart Values (Kubernetes)
- **Location**: `deploy/helm/datacenter-docs/values.yaml`
- **Format**: YAML (Helm-specific)
- **Use case**: Kubernetes deployments via Helm
- **Variants**:
  - `values.yaml` - Default configuration
  - `values-development.yaml` - Development settings
  - `values-production.yaml` - Production example

## Quick Start

### For Docker Compose Development

1. Copy the environment template:
   ```bash
   cp .env.example .env
   ```

2. Edit `.env` with your configuration:
   ```bash
   nano .env
   ```

3. Update the following required values:
   - `MONGO_ROOT_PASSWORD` - MongoDB password
   - `LLM_API_KEY` - Your LLM provider API key
   - `LLM_BASE_URL` - LLM provider endpoint
   - `MCP_API_KEY` - MCP server API key

4. Start the services:
   ```bash
   cd deploy/docker
   docker-compose -f docker-compose.dev.yml up -d
   ```

### For Kubernetes/Helm Deployment

1. Copy and customize the values file:
   ```bash
   cp values.yaml my-values.yaml
   ```

2. Edit `my-values.yaml` with your configuration

3. Deploy with Helm:
   ```bash
   helm install my-release deploy/helm/datacenter-docs -f my-values.yaml
   ```

## Configuration Mapping

Here's how the `.env` variables map to `values.yaml`:

| .env Variable | values.yaml Path | Description |
|---------------|------------------|-------------|
| `MONGO_ROOT_USER` | `mongodb.auth.rootUsername` | MongoDB root username |
| `MONGO_ROOT_PASSWORD` | `mongodb.auth.rootPassword` | MongoDB root password |
| `MONGODB_URL` | `mongodb.url` | MongoDB connection URL |
| `MONGODB_DATABASE` | `mongodb.auth.database` | Database name |
| `REDIS_PASSWORD` | `redis.auth.password` | Redis password |
| `REDIS_URL` | `redis.url` | Redis connection URL |
| `MCP_SERVER_URL` | `mcp.server.url` | MCP server endpoint |
| `MCP_API_KEY` | `mcp.server.apiKey` | MCP API key |
| `PROXMOX_HOST` | `proxmox.host` | Proxmox server hostname |
| `PROXMOX_USER` | `proxmox.auth.user` | Proxmox username |
| `PROXMOX_PASSWORD` | `proxmox.auth.password` | Proxmox password |
| `LLM_BASE_URL` | `llm.baseUrl` | LLM API endpoint |
| `LLM_API_KEY` | `llm.apiKey` | LLM API key |
| `LLM_MODEL` | `llm.model` | LLM model name |
| `LLM_TEMPERATURE` | `llm.generation.temperature` | Generation temperature |
| `LLM_MAX_TOKENS` | `llm.generation.maxTokens` | Max tokens per request |
| `API_HOST` | `api.host` | API server host |
| `API_PORT` | `api.port` | API server port |
| `WORKERS` | `api.workers` | Number of API workers |
| `CORS_ORIGINS` | `cors.origins` | Allowed CORS origins |
| `LOG_LEVEL` | `application.logging.level` | Logging level |
| `DEBUG` | `application.debug` | Debug mode |
| `CELERY_BROKER_URL` | `celery.broker.url` | Celery broker URL |
| `CELERY_RESULT_BACKEND` | `celery.result.backend` | Celery result backend |
| `VECTOR_STORE_PATH` | `vectorStore.chroma.path` | Vector store path |
| `EMBEDDING_MODEL` | `vectorStore.embedding.model` | Embedding model name |

## Configuration Sections

### 1. Database Configuration

#### MongoDB
```yaml
mongodb:
  auth:
    rootUsername: admin
    rootPassword: "your-secure-password"
    database: datacenter_docs
  url: "mongodb://admin:password@mongodb:27017"
```

**Security Note**: Always use strong passwords in production!

#### Redis
```yaml
redis:
  auth:
    password: "your-redis-password"
  url: "redis://redis:6379/0"
```

### 2. LLM Provider Configuration

The system supports multiple LLM providers through OpenAI-compatible APIs:

#### OpenAI
```yaml
llm:
  provider: openai
  baseUrl: "https://api.openai.com/v1"
  apiKey: "sk-your-key"
  model: "gpt-4-turbo-preview"
```

#### Anthropic Claude
```yaml
llm:
  provider: anthropic
  baseUrl: "https://api.anthropic.com/v1"
  apiKey: "sk-ant-your-key"
  model: "claude-sonnet-4-20250514"
```

#### Local (Ollama)
```yaml
llm:
  provider: ollama
  baseUrl: "http://localhost:11434/v1"
  apiKey: "ollama"
  model: "llama3"
```

### 3. Auto-Remediation Configuration

Control how the system handles automated problem resolution:

```yaml
autoRemediation:
  enabled: true
  minReliabilityScore: 85.0
  requireApprovalThreshold: 90.0
  maxActionsPerHour: 100
  dryRun: false  # Set to true for testing
```

**Important**: Start with `dryRun: true` to test without making actual changes!

### 4. Infrastructure Collectors

Enable/disable different infrastructure data collectors:

```yaml
collectors:
  vmware:
    enabled: true
    host: "vcenter.example.com"
  kubernetes:
    enabled: true
  proxmox:
    enabled: true
```

### 5. Security Settings

```yaml
security:
  authentication:
    enabled: true
    method: "jwt"
  rateLimit:
    enabled: true
    requestsPerMinute: 100
```

## Environment-Specific Configuration

### Development

For development, use minimal resources and verbose logging:

```yaml
application:
  logging:
    level: "DEBUG"
  debug: true
  environment: "development"

autoRemediation:
  dryRun: true  # Never make real changes in dev

llm:
  baseUrl: "http://localhost:11434/v1"  # Use local Ollama
```

### Production

For production, use secure settings and proper resource limits:

```yaml
application:
  logging:
    level: "INFO"
  debug: false
  environment: "production"

autoRemediation:
  enabled: true
  minReliabilityScore: 95.0  # Higher threshold
  requireApprovalThreshold: 98.0
  dryRun: false

security:
  authentication:
    enabled: true
  rateLimit:
    enabled: true
```

## Configuration Best Practices

### 1. Secret Management

**Never commit secrets to version control!**

For development:
- Use `.env` (add to `.gitignore`)
- Use default passwords (change in production)

For production:
- Use Kubernetes Secrets
- Use external secret managers (Vault, AWS Secrets Manager, etc.)
- Rotate secrets regularly

Example with Kubernetes Secret:
```bash
kubectl create secret generic datacenter-docs-secrets \
  --from-literal=mongodb-password="$(openssl rand -base64 32)" \
  --from-literal=llm-api-key="your-actual-key"
```

### 2. Resource Limits

Always set appropriate resource limits:

```yaml
resources:
  api:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "2Gi"
      cpu: "1000m"
```

### 3. High Availability

For production deployments:

```yaml
api:
  replicaCount: 3  # Multiple replicas

mongodb:
  persistence:
    enabled: true
    size: 50Gi
    storageClass: "fast-ssd"
```

### 4. Monitoring

Enable monitoring and observability:

```yaml
monitoring:
  metrics:
    enabled: true
  health:
    enabled: true
  tracing:
    enabled: true
    provider: "jaeger"
```

### 5. Backup Configuration

Configure regular backups:

```yaml
backup:
  enabled: true
  schedule: "0 2 * * *"  # Daily at 2 AM
  retention:
    daily: 7
    weekly: 4
    monthly: 12
```

## Validation

### Validate .env File

```bash
# Check for required variables
grep -E "^(MONGODB_URL|LLM_API_KEY|MCP_API_KEY)=" .env
```

### Validate values.yaml

```bash
# Install yq (YAML processor)
# brew install yq  # macOS
# sudo apt install yq  # Ubuntu

# Validate YAML syntax
yq eval '.' values.yaml > /dev/null && echo "Valid YAML" || echo "Invalid YAML"

# Check specific values
yq eval '.llm.apiKey' values.yaml
yq eval '.mongodb.auth.rootPassword' values.yaml
```

### Validate Helm Values

```bash
# Lint the Helm chart
helm lint deploy/helm/datacenter-docs -f my-values.yaml

# Dry-run installation
helm install test deploy/helm/datacenter-docs -f my-values.yaml --dry-run --debug
```

## Troubleshooting

### Common Issues

#### 1. MongoDB Connection Failed

Check:
- MongoDB URL is correct
- Password matches in both MongoDB and application config
- MongoDB service is running

```bash
# Test MongoDB connection
docker exec -it datacenter-docs-mongodb mongosh \
  -u admin -p admin123 --authenticationDatabase admin
```

#### 2. LLM API Errors

Check:
- API key is valid
- Base URL is correct
- Model name is supported by the provider
- Network connectivity to LLM provider

```bash
# Test LLM API
curl -H "Authorization: Bearer $LLM_API_KEY" \
  $LLM_BASE_URL/models
```

#### 3. Redis Connection Issues

Check:
- Redis URL is correct
- Redis service is running
- Password is correct (if enabled)

```bash
# Test Redis connection
docker exec -it datacenter-docs-redis redis-cli ping
```

## Converting Between Formats

### From .env to values.yaml

We provide a conversion script:

```bash
# TODO: Create conversion script
# python scripts/env_to_values.py .env > my-values.yaml
```

Manual conversion example:
```bash
# .env
MONGODB_URL=mongodb://admin:pass@mongodb:27017

# values.yaml
mongodb:
  url: "mongodb://admin:pass@mongodb:27017"
```

### From values.yaml to .env

```bash
# Extract specific values
echo "MONGODB_URL=$(yq eval '.mongodb.url' values.yaml)" >> .env
echo "LLM_API_KEY=$(yq eval '.llm.apiKey' values.yaml)" >> .env
```

## Examples

### Example 1: Local Development with Ollama

```yaml
# values-local.yaml
llm:
  provider: ollama
  baseUrl: "http://localhost:11434/v1"
  apiKey: "ollama"
  model: "llama3"

application:
  debug: true
  logging:
    level: "DEBUG"

autoRemediation:
  dryRun: true
```

### Example 2: Production with OpenAI

```yaml
# values-prod.yaml
llm:
  provider: openai
  baseUrl: "https://api.openai.com/v1"
  apiKey: "sk-prod-key-from-secret-manager"
  model: "gpt-4-turbo-preview"

application:
  debug: false
  logging:
    level: "INFO"

autoRemediation:
  enabled: true
  minReliabilityScore: 95.0
  dryRun: false

security:
  authentication:
    enabled: true
  rateLimit:
    enabled: true
```

### Example 3: Multi-Environment Setup

```bash
# Development
helm install dev deploy/helm/datacenter-docs \
  -f values.yaml \
  -f values-development.yaml

# Staging
helm install staging deploy/helm/datacenter-docs \
  -f values.yaml \
  -f values-staging.yaml

# Production
helm install prod deploy/helm/datacenter-docs \
  -f values.yaml \
  -f values-production.yaml
```

## Related Documentation

- [Main README](README.md)
- [Docker Deployment](deploy/docker/README.md)
- [Helm Chart](deploy/helm/README.md)
- [Environment Variables](.env.example)
- [Project Repository](https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine)

## Support

For configuration help:
- Open an issue: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
- Check the documentation
- Review example configurations in `deploy/` directory