feat: implement template-based documentation generation system for Proxmox
Some checks failed
CI/CD Pipeline / Run Tests (push) Has been cancelled
CI/CD Pipeline / Security Scanning (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (api) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (chat) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (frontend) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (worker) (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Generate Documentation (push) Has started running
CI/CD Pipeline / Lint Code (push) Has started running

Implement a scalable system for automatic documentation generation from infrastructure
systems, preventing LLM context overload through template-driven sectioning.

**New Features:**

1. **YAML Template System** (`templates/documentation/proxmox.yaml`)
   - Define documentation sections independently
   - Specify data requirements per section
   - Configure prompts, generation settings, and scheduling
   - Prevents LLM context overflow by sectioning data

2. **Template-Based Generator** (`src/datacenter_docs/generators/template_generator.py`)
   - Load and parse YAML templates
   - Generate documentation sections independently
   - Extract only required data for each section
   - Save sections individually to files and database
   - Combine sections with table of contents

3. **Celery Tasks** (`src/datacenter_docs/workers/documentation_tasks.py`)
   - `collect_and_generate_docs`: Collect data and generate docs
   - `generate_proxmox_docs`: Scheduled Proxmox documentation (daily at 2 AM)
   - `generate_all_docs`: Generate docs for all systems in parallel
   - `index_generated_docs`: Index generated docs into vector store for RAG
   - `full_docs_pipeline`: Complete workflow (collect → generate → index)

4. **Scheduled Jobs** (updated `celery_app.py`)
   - Daily Proxmox documentation generation
   - Every 6 hours: all systems documentation
   - Weekly: full pipeline with indexing
   - Proper task routing and rate limiting

5. **Test Script** (`scripts/test_proxmox_docs.py`)
   - End-to-end testing of documentation generation
   - Mock data collection from Proxmox
   - Template-based generation
   - File and database storage

6. **Configuration Updates** (`src/datacenter_docs/utils/config.py`)
   - Add port configuration fields for Docker services
   - Add MongoDB and Redis credentials
   - Support all required environment variables

**Proxmox Documentation Sections:**
- Infrastructure Overview (cluster, nodes, stats)
- Virtual Machines Inventory
- LXC Containers Inventory
- Storage Configuration
- Network Configuration
- Maintenance Procedures

**Benefits:**
- Scalable to multiple infrastructure systems
- Prevents LLM context window overflow
- Independent section generation
- Scheduled automatic updates
- Vector store integration for RAG chat
- Template-driven approach for consistency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-20 19:23:30 +02:00
parent 27dd9e00b6
commit 16fc8e2659
6 changed files with 1178 additions and 1 deletions

View File

@@ -0,0 +1,409 @@
"""
Template-Based Documentation Generator
Generates documentation using YAML templates that define sections and prompts.
This approach prevents LLM context overload by generating documentation in sections.
"""
import json
import logging
from pathlib import Path
from typing import Any, Dict, List, Optional
import yaml
from datacenter_docs.generators.base import BaseGenerator
logger = logging.getLogger(__name__)
class DocumentationTemplate:
"""Represents a documentation template loaded from YAML"""
def __init__(self, template_path: Path):
"""
Initialize template from YAML file
Args:
template_path: Path to YAML template file
"""
self.path = template_path
self.data = self._load_template()
def _load_template(self) -> Dict[str, Any]:
"""Load and parse YAML template"""
try:
with open(self.path, "r", encoding="utf-8") as f:
return yaml.safe_load(f)
except Exception as e:
logger.error(f"Failed to load template {self.path}: {e}")
raise
@property
def name(self) -> str:
"""Get template name"""
return self.data.get("metadata", {}).get("name", "Unknown")
@property
def collector(self) -> str:
"""Get required collector name"""
return self.data.get("metadata", {}).get("collector", "")
@property
def sections(self) -> List[Dict[str, Any]]:
"""Get documentation sections"""
return self.data.get("sections", [])
@property
def generation_config(self) -> Dict[str, Any]:
"""Get generation configuration"""
return self.data.get("generation", {})
@property
def output_config(self) -> Dict[str, Any]:
"""Get output configuration"""
return self.data.get("output", {})
@property
def schedule_config(self) -> Dict[str, Any]:
"""Get schedule configuration"""
return self.data.get("schedule", {})
class TemplateBasedGenerator(BaseGenerator):
"""
Generator that uses YAML templates to generate sectioned documentation
This prevents LLM context overload by:
1. Loading templates that define sections
2. Generating each section independently
3. Using only required data for each section
"""
def __init__(self, template_path: str):
"""
Initialize template-based generator
Args:
template_path: Path to YAML template file
"""
self.template = DocumentationTemplate(Path(template_path))
super().__init__(
name=self.template.collector, section=f"{self.template.collector}_docs"
)
async def generate(self, data: Dict[str, Any]) -> str:
"""
Generate complete documentation using template
This method orchestrates the generation of all sections.
Args:
data: Collected infrastructure data
Returns:
Combined documentation (all sections)
"""
self.logger.info(
f"Generating documentation for {self.template.name} using template"
)
# Validate data matches template collector
collector_name = data.get("metadata", {}).get("collector", "")
if collector_name != self.template.collector:
self.logger.warning(
f"Data collector ({collector_name}) doesn't match template ({self.template.collector})"
)
# Generate each section
sections_content = []
for section_def in self.template.sections:
section_content = await self.generate_section(section_def, data)
if section_content:
sections_content.append(section_content)
# Combine all sections
combined_doc = self._combine_sections(sections_content)
return combined_doc
async def generate_section(
self, section_def: Dict[str, Any], full_data: Dict[str, Any]
) -> Optional[str]:
"""
Generate a single documentation section
Args:
section_def: Section definition from template
full_data: Complete collected data
Returns:
Generated section content in Markdown
"""
section_id = section_def.get("id", "unknown")
section_title = section_def.get("title", "Untitled Section")
data_requirements = section_def.get("data_requirements", [])
prompt_template = section_def.get("prompt_template", "")
self.logger.info(f"Generating section: {section_title}")
# Extract only required data for this section
section_data = self._extract_section_data(full_data, data_requirements)
# Build prompt by substituting placeholders
prompt = self._build_prompt(prompt_template, section_data)
# Get generation config
gen_config = self.template.generation_config
temperature = gen_config.get("temperature", 0.7)
max_tokens = gen_config.get("max_tokens", 4000)
# System prompt for documentation generation
system_prompt = """You are a technical documentation expert specializing in datacenter infrastructure.
Generate clear, accurate, and well-structured documentation in Markdown format.
Guidelines:
- Use proper Markdown formatting (headers, tables, lists, code blocks)
- Be precise and factual based on provided data
- Include practical examples and recommendations
- Use tables for structured data
- Use bullet points for lists
- Use code blocks for commands/configurations
- Organize content with clear sections
- Write in a professional but accessible tone
"""
try:
# Generate content using LLM
content = await self.generate_with_llm(
system_prompt=system_prompt,
user_prompt=prompt,
temperature=temperature,
max_tokens=max_tokens,
)
# Add section header
section_content = f"# {section_title}\n\n{content}\n\n"
self.logger.info(f"✓ Section '{section_title}' generated successfully")
return section_content
except Exception as e:
self.logger.error(f"Failed to generate section '{section_title}': {e}")
return None
def _extract_section_data(
self, full_data: Dict[str, Any], requirements: List[str]
) -> Dict[str, Any]:
"""
Extract only required data for a section
Args:
full_data: Complete collected data
requirements: List of required data keys
Returns:
Dictionary with only required data
"""
section_data = {}
data_section = full_data.get("data", {})
for req in requirements:
if req in data_section:
section_data[req] = data_section[req]
else:
self.logger.warning(f"Required data '{req}' not found in collected data")
section_data[req] = None
return section_data
def _build_prompt(self, template: str, data: Dict[str, Any]) -> str:
"""
Build prompt by substituting data into template
Args:
template: Prompt template with {placeholders}
data: Data to substitute
Returns:
Completed prompt
"""
prompt = template
# Replace each placeholder with formatted data
for key, value in data.items():
placeholder = f"{{{key}}}"
if placeholder in prompt:
# Format data for prompt
formatted_value = self._format_data_for_prompt(value)
prompt = prompt.replace(placeholder, formatted_value)
return prompt
def _format_data_for_prompt(self, data: Any) -> str:
"""
Format data for inclusion in LLM prompt
Args:
data: Data to format (dict, list, str, etc.)
Returns:
Formatted string representation
"""
if data is None:
return "No data available"
if isinstance(data, (dict, list)):
# Pretty print JSON for structured data
try:
return json.dumps(data, indent=2, default=str)
except Exception:
return str(data)
return str(data)
def _combine_sections(self, sections: List[str]) -> str:
"""
Combine all sections into a single document
Args:
sections: List of section contents
Returns:
Combined markdown document
"""
# Add document header
header = f"""# {self.template.name} Documentation
*Generated automatically from infrastructure data*
---
"""
# Add table of contents
toc = "## Table of Contents\n\n"
for i, section in enumerate(sections, 1):
# Extract section title from first line
lines = section.strip().split("\n")
if lines:
title = lines[0].replace("#", "").strip()
toc += f"{i}. [{title}](#{title.lower().replace(' ', '-')})\n"
toc += "\n---\n\n"
# Combine all parts
combined = header + toc + "\n".join(sections)
return combined
async def generate_and_save_sections(
self, data: Dict[str, Any], save_individually: bool = True
) -> List[Dict[str, Any]]:
"""
Generate and save each section individually
This is useful for very large documentation where you want each
section as a separate file.
Args:
data: Collected infrastructure data
save_individually: Save each section as separate file
Returns:
List of results for each section
"""
results = []
output_config = self.template.output_config
output_dir = output_config.get("directory", "output")
save_to_db = output_config.get("save_to_database", True)
save_to_file = output_config.get("save_to_file", True)
for section_def in self.template.sections:
section_id = section_def.get("id")
section_title = section_def.get("title")
# Generate section
content = await self.generate_section(section_def, data)
if not content:
results.append(
{
"section_id": section_id,
"success": False,
"error": "Generation failed",
}
)
continue
result = {
"section_id": section_id,
"title": section_title,
"success": True,
"content": content,
}
# Save section if requested
if save_individually:
if save_to_file:
# Save to file
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
filename = f"{section_id}.md"
file_path = output_path / filename
file_path.write_text(content, encoding="utf-8")
result["file_path"] = str(file_path)
self.logger.info(f"Saved section to: {file_path}")
if save_to_db:
# Save to database
metadata = {
"section_id": section_id,
"template": str(self.template.path),
"category": section_def.get("category", ""),
}
# Create temporary generator for this section
temp_gen = BaseGenerator.__new__(BaseGenerator)
temp_gen.name = self.name
temp_gen.section = section_id
temp_gen.logger = self.logger
temp_gen.llm = self.llm
await temp_gen.save_to_database(content, metadata)
results.append(result)
return results
async def example_usage() -> None:
"""Example of using template-based generator"""
from datacenter_docs.collectors.proxmox_collector import ProxmoxCollector
# Collect data
collector = ProxmoxCollector()
collect_result = await collector.run()
if not collect_result["success"]:
print(f"❌ Collection failed: {collect_result['error']}")
return
# Generate documentation using template
template_path = "templates/documentation/proxmox.yaml"
generator = TemplateBasedGenerator(template_path)
# Generate and save all sections
sections_results = await generator.generate_and_save_sections(
data=collect_result["data"], save_individually=True
)
# Print results
for result in sections_results:
if result["success"]:
print(f"✅ Section '{result['title']}' generated successfully")
else:
print(f"❌ Section '{result.get('section_id')}' failed: {result.get('error')}")
if __name__ == "__main__":
import asyncio
asyncio.run(example_usage())