Add Helm chart, Docs, and Config conversion script

2025-10-22 14:35:21 +02:00
parent ba9900bd57
commit 2719cfff59
31 changed files with 4436 additions and 0 deletions
--- a/CONFIGURATION.md
+++ b/CONFIGURATION.md
@@ -0,0 +1,511 @@
+# Configuration Guide
+
+This guide explains how to configure the Datacenter Documentation & Remediation Engine using the various configuration files available.
+
+## Configuration Files Overview
+
+The project supports multiple configuration methods to suit different deployment scenarios:
+
+### 1. `.env` File (Docker Compose)
+- **Location**: Root of the project
+- **Format**: Environment variables
+- **Use case**: Local development, Docker Compose deployments
+- **Template**: `.env.example`
+
+### 2. `values.yaml` File (Structured Configuration)
+- **Location**: Root of the project
+- **Format**: YAML
+- **Use case**: General configuration, Helm deployments, configuration management
+- **Template**: `values.yaml`
+
+### 3. Helm Chart Values (Kubernetes)
+- **Location**: `deploy/helm/datacenter-docs/values.yaml`
+- **Format**: YAML (Helm-specific)
+- **Use case**: Kubernetes deployments via Helm
+- **Variants**:
+  - `values.yaml` - Default configuration
+  - `values-development.yaml` - Development settings
+  - `values-production.yaml` - Production example
+
+## Quick Start
+
+### For Docker Compose Development
+
+1. Copy the environment template:
+   ```bash
+   cp .env.example .env
+   ```
+
+2. Edit `.env` with your configuration:
+   ```bash
+   nano .env
+   ```
+
+3. Update the following required values:
+   - `MONGO_ROOT_PASSWORD` - MongoDB password
+   - `LLM_API_KEY` - Your LLM provider API key
+   - `LLM_BASE_URL` - LLM provider endpoint
+   - `MCP_API_KEY` - MCP server API key
+
+4. Start the services:
+   ```bash
+   cd deploy/docker
+   docker-compose -f docker-compose.dev.yml up -d
+   ```
+
+### For Kubernetes/Helm Deployment
+
+1. Copy and customize the values file:
+   ```bash
+   cp values.yaml my-values.yaml
+   ```
+
+2. Edit `my-values.yaml` with your configuration
+
+3. Deploy with Helm:
+   ```bash
+   helm install my-release deploy/helm/datacenter-docs -f my-values.yaml
+   ```
+
+## Configuration Mapping
+
+Here's how the `.env` variables map to `values.yaml`:
+
+| .env Variable | values.yaml Path | Description |
+|---------------|------------------|-------------|
+| `MONGO_ROOT_USER` | `mongodb.auth.rootUsername` | MongoDB root username |
+| `MONGO_ROOT_PASSWORD` | `mongodb.auth.rootPassword` | MongoDB root password |
+| `MONGODB_URL` | `mongodb.url` | MongoDB connection URL |
+| `MONGODB_DATABASE` | `mongodb.auth.database` | Database name |
+| `REDIS_PASSWORD` | `redis.auth.password` | Redis password |
+| `REDIS_URL` | `redis.url` | Redis connection URL |
+| `MCP_SERVER_URL` | `mcp.server.url` | MCP server endpoint |
+| `MCP_API_KEY` | `mcp.server.apiKey` | MCP API key |
+| `PROXMOX_HOST` | `proxmox.host` | Proxmox server hostname |
+| `PROXMOX_USER` | `proxmox.auth.user` | Proxmox username |
+| `PROXMOX_PASSWORD` | `proxmox.auth.password` | Proxmox password |
+| `LLM_BASE_URL` | `llm.baseUrl` | LLM API endpoint |
+| `LLM_API_KEY` | `llm.apiKey` | LLM API key |
+| `LLM_MODEL` | `llm.model` | LLM model name |
+| `LLM_TEMPERATURE` | `llm.generation.temperature` | Generation temperature |
+| `LLM_MAX_TOKENS` | `llm.generation.maxTokens` | Max tokens per request |
+| `API_HOST` | `api.host` | API server host |
+| `API_PORT` | `api.port` | API server port |
+| `WORKERS` | `api.workers` | Number of API workers |
+| `CORS_ORIGINS` | `cors.origins` | Allowed CORS origins |
+| `LOG_LEVEL` | `application.logging.level` | Logging level |
+| `DEBUG` | `application.debug` | Debug mode |
+| `CELERY_BROKER_URL` | `celery.broker.url` | Celery broker URL |
+| `CELERY_RESULT_BACKEND` | `celery.result.backend` | Celery result backend |
+| `VECTOR_STORE_PATH` | `vectorStore.chroma.path` | Vector store path |
+| `EMBEDDING_MODEL` | `vectorStore.embedding.model` | Embedding model name |
+
+## Configuration Sections
+
+### 1. Database Configuration
+
+#### MongoDB
+```yaml
+mongodb:
+  auth:
+    rootUsername: admin
+    rootPassword: "your-secure-password"
+    database: datacenter_docs
+  url: "mongodb://admin:password@mongodb:27017"
+```
+
+**Security Note**: Always use strong passwords in production!
+
+#### Redis
+```yaml
+redis:
+  auth:
+    password: "your-redis-password"
+  url: "redis://redis:6379/0"
+```
+
+### 2. LLM Provider Configuration
+
+The system supports multiple LLM providers through OpenAI-compatible APIs:
+
+#### OpenAI
+```yaml
+llm:
+  provider: openai
+  baseUrl: "https://api.openai.com/v1"
+  apiKey: "sk-your-key"
+  model: "gpt-4-turbo-preview"
+```
+
+#### Anthropic Claude
+```yaml
+llm:
+  provider: anthropic
+  baseUrl: "https://api.anthropic.com/v1"
+  apiKey: "sk-ant-your-key"
+  model: "claude-sonnet-4-20250514"
+```
+
+#### Local (Ollama)
+```yaml
+llm:
+  provider: ollama
+  baseUrl: "http://localhost:11434/v1"
+  apiKey: "ollama"
+  model: "llama3"
+```
+
+### 3. Auto-Remediation Configuration
+
+Control how the system handles automated problem resolution:
+
+```yaml
+autoRemediation:
+  enabled: true
+  minReliabilityScore: 85.0
+  requireApprovalThreshold: 90.0
+  maxActionsPerHour: 100
+  dryRun: false  # Set to true for testing
+```
+
+**Important**: Start with `dryRun: true` to test without making actual changes!
+
+### 4. Infrastructure Collectors
+
+Enable/disable different infrastructure data collectors:
+
+```yaml
+collectors:
+  vmware:
+    enabled: true
+    host: "vcenter.example.com"
+  kubernetes:
+    enabled: true
+  proxmox:
+    enabled: true
+```
+
+### 5. Security Settings
+
+```yaml
+security:
+  authentication:
+    enabled: true
+    method: "jwt"
+  rateLimit:
+    enabled: true
+    requestsPerMinute: 100
+```
+
+## Environment-Specific Configuration
+
+### Development
+
+For development, use minimal resources and verbose logging:
+
+```yaml
+application:
+  logging:
+    level: "DEBUG"
+  debug: true
+  environment: "development"
+
+autoRemediation:
+  dryRun: true  # Never make real changes in dev
+
+llm:
+  baseUrl: "http://localhost:11434/v1"  # Use local Ollama
+```
+
+### Production
+
+For production, use secure settings and proper resource limits:
+
+```yaml
+application:
+  logging:
+    level: "INFO"
+  debug: false
+  environment: "production"
+
+autoRemediation:
+  enabled: true
+  minReliabilityScore: 95.0  # Higher threshold
+  requireApprovalThreshold: 98.0
+  dryRun: false
+
+security:
+  authentication:
+    enabled: true
+  rateLimit:
+    enabled: true
+```
+
+## Configuration Best Practices
+
+### 1. Secret Management
+
+**Never commit secrets to version control!**
+
+For development:
+- Use `.env` (add to `.gitignore`)
+- Use default passwords (change in production)
+
+For production:
+- Use Kubernetes Secrets
+- Use external secret managers (Vault, AWS Secrets Manager, etc.)
+- Rotate secrets regularly
+
+Example with Kubernetes Secret:
+```bash
+kubectl create secret generic datacenter-docs-secrets \
+  --from-literal=mongodb-password="$(openssl rand -base64 32)" \
+  --from-literal=llm-api-key="your-actual-key"
+```
+
+### 2. Resource Limits
+
+Always set appropriate resource limits:
+
+```yaml
+resources:
+  api:
+    requests:
+      memory: "512Mi"
+      cpu: "250m"
+    limits:
+      memory: "2Gi"
+      cpu: "1000m"
+```
+
+### 3. High Availability
+
+For production deployments:
+
+```yaml
+api:
+  replicaCount: 3  # Multiple replicas
+
+mongodb:
+  persistence:
+    enabled: true
+    size: 50Gi
+    storageClass: "fast-ssd"
+```
+
+### 4. Monitoring
+
+Enable monitoring and observability:
+
+```yaml
+monitoring:
+  metrics:
+    enabled: true
+  health:
+    enabled: true
+  tracing:
+    enabled: true
+    provider: "jaeger"
+```
+
+### 5. Backup Configuration
+
+Configure regular backups:
+
+```yaml
+backup:
+  enabled: true
+  schedule: "0 2 * * *"  # Daily at 2 AM
+  retention:
+    daily: 7
+    weekly: 4
+    monthly: 12
+```
+
+## Validation
+
+### Validate .env File
+
+```bash
+# Check for required variables
+grep -E "^(MONGODB_URL|LLM_API_KEY|MCP_API_KEY)=" .env
+```
+
+### Validate values.yaml
+
+```bash
+# Install yq (YAML processor)
+# brew install yq  # macOS
+# sudo apt install yq  # Ubuntu
+
+# Validate YAML syntax
+yq eval '.' values.yaml > /dev/null && echo "Valid YAML" || echo "Invalid YAML"
+
+# Check specific values
+yq eval '.llm.apiKey' values.yaml
+yq eval '.mongodb.auth.rootPassword' values.yaml
+```
+
+### Validate Helm Values
+
+```bash
+# Lint the Helm chart
+helm lint deploy/helm/datacenter-docs -f my-values.yaml
+
+# Dry-run installation
+helm install test deploy/helm/datacenter-docs -f my-values.yaml --dry-run --debug
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### 1. MongoDB Connection Failed
+
+Check:
+- MongoDB URL is correct
+- Password matches in both MongoDB and application config
+- MongoDB service is running
+
+```bash
+# Test MongoDB connection
+docker exec -it datacenter-docs-mongodb mongosh \
+  -u admin -p admin123 --authenticationDatabase admin
+```
+
+#### 2. LLM API Errors
+
+Check:
+- API key is valid
+- Base URL is correct
+- Model name is supported by the provider
+- Network connectivity to LLM provider
+
+```bash
+# Test LLM API
+curl -H "Authorization: Bearer $LLM_API_KEY" \
+  $LLM_BASE_URL/models
+```
+
+#### 3. Redis Connection Issues
+
+Check:
+- Redis URL is correct
+- Redis service is running
+- Password is correct (if enabled)
+
+```bash
+# Test Redis connection
+docker exec -it datacenter-docs-redis redis-cli ping
+```
+
+## Converting Between Formats
+
+### From .env to values.yaml
+
+We provide a conversion script:
+
+```bash
+# TODO: Create conversion script
+# python scripts/env_to_values.py .env > my-values.yaml
+```
+
+Manual conversion example:
+```bash
+# .env
+MONGODB_URL=mongodb://admin:pass@mongodb:27017
+
+# values.yaml
+mongodb:
+  url: "mongodb://admin:pass@mongodb:27017"
+```
+
+### From values.yaml to .env
+
+```bash
+# Extract specific values
+echo "MONGODB_URL=$(yq eval '.mongodb.url' values.yaml)" >> .env
+echo "LLM_API_KEY=$(yq eval '.llm.apiKey' values.yaml)" >> .env
+```
+
+## Examples
+
+### Example 1: Local Development with Ollama
+
+```yaml
+# values-local.yaml
+llm:
+  provider: ollama
+  baseUrl: "http://localhost:11434/v1"
+  apiKey: "ollama"
+  model: "llama3"
+
+application:
+  debug: true
+  logging:
+    level: "DEBUG"
+
+autoRemediation:
+  dryRun: true
+```
+
+### Example 2: Production with OpenAI
+
+```yaml
+# values-prod.yaml
+llm:
+  provider: openai
+  baseUrl: "https://api.openai.com/v1"
+  apiKey: "sk-prod-key-from-secret-manager"
+  model: "gpt-4-turbo-preview"
+
+application:
+  debug: false
+  logging:
+    level: "INFO"
+
+autoRemediation:
+  enabled: true
+  minReliabilityScore: 95.0
+  dryRun: false
+
+security:
+  authentication:
+    enabled: true
+  rateLimit:
+    enabled: true
+```
+
+### Example 3: Multi-Environment Setup
+
+```bash
+# Development
+helm install dev deploy/helm/datacenter-docs \
+  -f values.yaml \
+  -f values-development.yaml
+
+# Staging
+helm install staging deploy/helm/datacenter-docs \
+  -f values.yaml \
+  -f values-staging.yaml
+
+# Production
+helm install prod deploy/helm/datacenter-docs \
+  -f values.yaml \
+  -f values-production.yaml
+```
+
+## Related Documentation
+
+- [Main README](README.md)
+- [Docker Deployment](deploy/docker/README.md)
+- [Helm Chart](deploy/helm/README.md)
+- [Environment Variables](.env.example)
+- [Project Repository](https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine)
+
+## Support
+
+For configuration help:
+- Open an issue: https://git.commandware.com/it-ops/llm-automation-docs-and-remediation-engine/issues
+- Check the documentation
+- Review example configurations in `deploy/` directory