feat: implement template-based documentation generation system for Proxmox
Some checks failed
CI/CD Pipeline / Run Tests (push) Has been cancelled
CI/CD Pipeline / Security Scanning (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (api) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (chat) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (frontend) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (worker) (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Generate Documentation (push) Has started running
CI/CD Pipeline / Lint Code (push) Has started running
Some checks failed
CI/CD Pipeline / Run Tests (push) Has been cancelled
CI/CD Pipeline / Security Scanning (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (api) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (chat) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (frontend) (push) Has been cancelled
CI/CD Pipeline / Build and Push Docker Images (worker) (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Generate Documentation (push) Has started running
CI/CD Pipeline / Lint Code (push) Has started running
Implement a scalable system for automatic documentation generation from infrastructure systems, preventing LLM context overload through template-driven sectioning. **New Features:** 1. **YAML Template System** (`templates/documentation/proxmox.yaml`) - Define documentation sections independently - Specify data requirements per section - Configure prompts, generation settings, and scheduling - Prevents LLM context overflow by sectioning data 2. **Template-Based Generator** (`src/datacenter_docs/generators/template_generator.py`) - Load and parse YAML templates - Generate documentation sections independently - Extract only required data for each section - Save sections individually to files and database - Combine sections with table of contents 3. **Celery Tasks** (`src/datacenter_docs/workers/documentation_tasks.py`) - `collect_and_generate_docs`: Collect data and generate docs - `generate_proxmox_docs`: Scheduled Proxmox documentation (daily at 2 AM) - `generate_all_docs`: Generate docs for all systems in parallel - `index_generated_docs`: Index generated docs into vector store for RAG - `full_docs_pipeline`: Complete workflow (collect → generate → index) 4. **Scheduled Jobs** (updated `celery_app.py`) - Daily Proxmox documentation generation - Every 6 hours: all systems documentation - Weekly: full pipeline with indexing - Proper task routing and rate limiting 5. **Test Script** (`scripts/test_proxmox_docs.py`) - End-to-end testing of documentation generation - Mock data collection from Proxmox - Template-based generation - File and database storage 6. **Configuration Updates** (`src/datacenter_docs/utils/config.py`) - Add port configuration fields for Docker services - Add MongoDB and Redis credentials - Support all required environment variables **Proxmox Documentation Sections:** - Infrastructure Overview (cluster, nodes, stats) - Virtual Machines Inventory - LXC Containers Inventory - Storage Configuration - Network Configuration - Maintenance Procedures **Benefits:** - Scalable to multiple infrastructure systems - Prevents LLM context window overflow - Independent section generation - Scheduled automatic updates - Vector store integration for RAG chat - Template-driven approach for consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
409
src/datacenter_docs/generators/template_generator.py
Normal file
409
src/datacenter_docs/generators/template_generator.py
Normal file
@@ -0,0 +1,409 @@
|
||||
"""
|
||||
Template-Based Documentation Generator
|
||||
|
||||
Generates documentation using YAML templates that define sections and prompts.
|
||||
This approach prevents LLM context overload by generating documentation in sections.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import yaml
|
||||
|
||||
from datacenter_docs.generators.base import BaseGenerator
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class DocumentationTemplate:
|
||||
"""Represents a documentation template loaded from YAML"""
|
||||
|
||||
def __init__(self, template_path: Path):
|
||||
"""
|
||||
Initialize template from YAML file
|
||||
|
||||
Args:
|
||||
template_path: Path to YAML template file
|
||||
"""
|
||||
self.path = template_path
|
||||
self.data = self._load_template()
|
||||
|
||||
def _load_template(self) -> Dict[str, Any]:
|
||||
"""Load and parse YAML template"""
|
||||
try:
|
||||
with open(self.path, "r", encoding="utf-8") as f:
|
||||
return yaml.safe_load(f)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load template {self.path}: {e}")
|
||||
raise
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
"""Get template name"""
|
||||
return self.data.get("metadata", {}).get("name", "Unknown")
|
||||
|
||||
@property
|
||||
def collector(self) -> str:
|
||||
"""Get required collector name"""
|
||||
return self.data.get("metadata", {}).get("collector", "")
|
||||
|
||||
@property
|
||||
def sections(self) -> List[Dict[str, Any]]:
|
||||
"""Get documentation sections"""
|
||||
return self.data.get("sections", [])
|
||||
|
||||
@property
|
||||
def generation_config(self) -> Dict[str, Any]:
|
||||
"""Get generation configuration"""
|
||||
return self.data.get("generation", {})
|
||||
|
||||
@property
|
||||
def output_config(self) -> Dict[str, Any]:
|
||||
"""Get output configuration"""
|
||||
return self.data.get("output", {})
|
||||
|
||||
@property
|
||||
def schedule_config(self) -> Dict[str, Any]:
|
||||
"""Get schedule configuration"""
|
||||
return self.data.get("schedule", {})
|
||||
|
||||
|
||||
class TemplateBasedGenerator(BaseGenerator):
|
||||
"""
|
||||
Generator that uses YAML templates to generate sectioned documentation
|
||||
|
||||
This prevents LLM context overload by:
|
||||
1. Loading templates that define sections
|
||||
2. Generating each section independently
|
||||
3. Using only required data for each section
|
||||
"""
|
||||
|
||||
def __init__(self, template_path: str):
|
||||
"""
|
||||
Initialize template-based generator
|
||||
|
||||
Args:
|
||||
template_path: Path to YAML template file
|
||||
"""
|
||||
self.template = DocumentationTemplate(Path(template_path))
|
||||
super().__init__(
|
||||
name=self.template.collector, section=f"{self.template.collector}_docs"
|
||||
)
|
||||
|
||||
async def generate(self, data: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Generate complete documentation using template
|
||||
|
||||
This method orchestrates the generation of all sections.
|
||||
|
||||
Args:
|
||||
data: Collected infrastructure data
|
||||
|
||||
Returns:
|
||||
Combined documentation (all sections)
|
||||
"""
|
||||
self.logger.info(
|
||||
f"Generating documentation for {self.template.name} using template"
|
||||
)
|
||||
|
||||
# Validate data matches template collector
|
||||
collector_name = data.get("metadata", {}).get("collector", "")
|
||||
if collector_name != self.template.collector:
|
||||
self.logger.warning(
|
||||
f"Data collector ({collector_name}) doesn't match template ({self.template.collector})"
|
||||
)
|
||||
|
||||
# Generate each section
|
||||
sections_content = []
|
||||
for section_def in self.template.sections:
|
||||
section_content = await self.generate_section(section_def, data)
|
||||
if section_content:
|
||||
sections_content.append(section_content)
|
||||
|
||||
# Combine all sections
|
||||
combined_doc = self._combine_sections(sections_content)
|
||||
|
||||
return combined_doc
|
||||
|
||||
async def generate_section(
|
||||
self, section_def: Dict[str, Any], full_data: Dict[str, Any]
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
Generate a single documentation section
|
||||
|
||||
Args:
|
||||
section_def: Section definition from template
|
||||
full_data: Complete collected data
|
||||
|
||||
Returns:
|
||||
Generated section content in Markdown
|
||||
"""
|
||||
section_id = section_def.get("id", "unknown")
|
||||
section_title = section_def.get("title", "Untitled Section")
|
||||
data_requirements = section_def.get("data_requirements", [])
|
||||
prompt_template = section_def.get("prompt_template", "")
|
||||
|
||||
self.logger.info(f"Generating section: {section_title}")
|
||||
|
||||
# Extract only required data for this section
|
||||
section_data = self._extract_section_data(full_data, data_requirements)
|
||||
|
||||
# Build prompt by substituting placeholders
|
||||
prompt = self._build_prompt(prompt_template, section_data)
|
||||
|
||||
# Get generation config
|
||||
gen_config = self.template.generation_config
|
||||
temperature = gen_config.get("temperature", 0.7)
|
||||
max_tokens = gen_config.get("max_tokens", 4000)
|
||||
|
||||
# System prompt for documentation generation
|
||||
system_prompt = """You are a technical documentation expert specializing in datacenter infrastructure.
|
||||
Generate clear, accurate, and well-structured documentation in Markdown format.
|
||||
|
||||
Guidelines:
|
||||
- Use proper Markdown formatting (headers, tables, lists, code blocks)
|
||||
- Be precise and factual based on provided data
|
||||
- Include practical examples and recommendations
|
||||
- Use tables for structured data
|
||||
- Use bullet points for lists
|
||||
- Use code blocks for commands/configurations
|
||||
- Organize content with clear sections
|
||||
- Write in a professional but accessible tone
|
||||
"""
|
||||
|
||||
try:
|
||||
# Generate content using LLM
|
||||
content = await self.generate_with_llm(
|
||||
system_prompt=system_prompt,
|
||||
user_prompt=prompt,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
# Add section header
|
||||
section_content = f"# {section_title}\n\n{content}\n\n"
|
||||
|
||||
self.logger.info(f"✓ Section '{section_title}' generated successfully")
|
||||
return section_content
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to generate section '{section_title}': {e}")
|
||||
return None
|
||||
|
||||
def _extract_section_data(
|
||||
self, full_data: Dict[str, Any], requirements: List[str]
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract only required data for a section
|
||||
|
||||
Args:
|
||||
full_data: Complete collected data
|
||||
requirements: List of required data keys
|
||||
|
||||
Returns:
|
||||
Dictionary with only required data
|
||||
"""
|
||||
section_data = {}
|
||||
data_section = full_data.get("data", {})
|
||||
|
||||
for req in requirements:
|
||||
if req in data_section:
|
||||
section_data[req] = data_section[req]
|
||||
else:
|
||||
self.logger.warning(f"Required data '{req}' not found in collected data")
|
||||
section_data[req] = None
|
||||
|
||||
return section_data
|
||||
|
||||
def _build_prompt(self, template: str, data: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Build prompt by substituting data into template
|
||||
|
||||
Args:
|
||||
template: Prompt template with {placeholders}
|
||||
data: Data to substitute
|
||||
|
||||
Returns:
|
||||
Completed prompt
|
||||
"""
|
||||
prompt = template
|
||||
|
||||
# Replace each placeholder with formatted data
|
||||
for key, value in data.items():
|
||||
placeholder = f"{{{key}}}"
|
||||
if placeholder in prompt:
|
||||
# Format data for prompt
|
||||
formatted_value = self._format_data_for_prompt(value)
|
||||
prompt = prompt.replace(placeholder, formatted_value)
|
||||
|
||||
return prompt
|
||||
|
||||
def _format_data_for_prompt(self, data: Any) -> str:
|
||||
"""
|
||||
Format data for inclusion in LLM prompt
|
||||
|
||||
Args:
|
||||
data: Data to format (dict, list, str, etc.)
|
||||
|
||||
Returns:
|
||||
Formatted string representation
|
||||
"""
|
||||
if data is None:
|
||||
return "No data available"
|
||||
|
||||
if isinstance(data, (dict, list)):
|
||||
# Pretty print JSON for structured data
|
||||
try:
|
||||
return json.dumps(data, indent=2, default=str)
|
||||
except Exception:
|
||||
return str(data)
|
||||
|
||||
return str(data)
|
||||
|
||||
def _combine_sections(self, sections: List[str]) -> str:
|
||||
"""
|
||||
Combine all sections into a single document
|
||||
|
||||
Args:
|
||||
sections: List of section contents
|
||||
|
||||
Returns:
|
||||
Combined markdown document
|
||||
"""
|
||||
# Add document header
|
||||
header = f"""# {self.template.name} Documentation
|
||||
|
||||
*Generated automatically from infrastructure data*
|
||||
|
||||
---
|
||||
|
||||
"""
|
||||
|
||||
# Add table of contents
|
||||
toc = "## Table of Contents\n\n"
|
||||
for i, section in enumerate(sections, 1):
|
||||
# Extract section title from first line
|
||||
lines = section.strip().split("\n")
|
||||
if lines:
|
||||
title = lines[0].replace("#", "").strip()
|
||||
toc += f"{i}. [{title}](#{title.lower().replace(' ', '-')})\n"
|
||||
toc += "\n---\n\n"
|
||||
|
||||
# Combine all parts
|
||||
combined = header + toc + "\n".join(sections)
|
||||
|
||||
return combined
|
||||
|
||||
async def generate_and_save_sections(
|
||||
self, data: Dict[str, Any], save_individually: bool = True
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Generate and save each section individually
|
||||
|
||||
This is useful for very large documentation where you want each
|
||||
section as a separate file.
|
||||
|
||||
Args:
|
||||
data: Collected infrastructure data
|
||||
save_individually: Save each section as separate file
|
||||
|
||||
Returns:
|
||||
List of results for each section
|
||||
"""
|
||||
results = []
|
||||
output_config = self.template.output_config
|
||||
output_dir = output_config.get("directory", "output")
|
||||
save_to_db = output_config.get("save_to_database", True)
|
||||
save_to_file = output_config.get("save_to_file", True)
|
||||
|
||||
for section_def in self.template.sections:
|
||||
section_id = section_def.get("id")
|
||||
section_title = section_def.get("title")
|
||||
|
||||
# Generate section
|
||||
content = await self.generate_section(section_def, data)
|
||||
|
||||
if not content:
|
||||
results.append(
|
||||
{
|
||||
"section_id": section_id,
|
||||
"success": False,
|
||||
"error": "Generation failed",
|
||||
}
|
||||
)
|
||||
continue
|
||||
|
||||
result = {
|
||||
"section_id": section_id,
|
||||
"title": section_title,
|
||||
"success": True,
|
||||
"content": content,
|
||||
}
|
||||
|
||||
# Save section if requested
|
||||
if save_individually:
|
||||
if save_to_file:
|
||||
# Save to file
|
||||
output_path = Path(output_dir)
|
||||
output_path.mkdir(parents=True, exist_ok=True)
|
||||
filename = f"{section_id}.md"
|
||||
file_path = output_path / filename
|
||||
file_path.write_text(content, encoding="utf-8")
|
||||
result["file_path"] = str(file_path)
|
||||
self.logger.info(f"Saved section to: {file_path}")
|
||||
|
||||
if save_to_db:
|
||||
# Save to database
|
||||
metadata = {
|
||||
"section_id": section_id,
|
||||
"template": str(self.template.path),
|
||||
"category": section_def.get("category", ""),
|
||||
}
|
||||
# Create temporary generator for this section
|
||||
temp_gen = BaseGenerator.__new__(BaseGenerator)
|
||||
temp_gen.name = self.name
|
||||
temp_gen.section = section_id
|
||||
temp_gen.logger = self.logger
|
||||
temp_gen.llm = self.llm
|
||||
await temp_gen.save_to_database(content, metadata)
|
||||
|
||||
results.append(result)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
async def example_usage() -> None:
|
||||
"""Example of using template-based generator"""
|
||||
from datacenter_docs.collectors.proxmox_collector import ProxmoxCollector
|
||||
|
||||
# Collect data
|
||||
collector = ProxmoxCollector()
|
||||
collect_result = await collector.run()
|
||||
|
||||
if not collect_result["success"]:
|
||||
print(f"❌ Collection failed: {collect_result['error']}")
|
||||
return
|
||||
|
||||
# Generate documentation using template
|
||||
template_path = "templates/documentation/proxmox.yaml"
|
||||
generator = TemplateBasedGenerator(template_path)
|
||||
|
||||
# Generate and save all sections
|
||||
sections_results = await generator.generate_and_save_sections(
|
||||
data=collect_result["data"], save_individually=True
|
||||
)
|
||||
|
||||
# Print results
|
||||
for result in sections_results:
|
||||
if result["success"]:
|
||||
print(f"✅ Section '{result['title']}' generated successfully")
|
||||
else:
|
||||
print(f"❌ Section '{result.get('section_id')}' failed: {result.get('error')}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import asyncio
|
||||
|
||||
asyncio.run(example_usage())
|
||||
Reference in New Issue
Block a user