feat: implement template-based documentation generation system for Proxmox

Implement a scalable system for automatic documentation generation from infrastructure systems, preventing LLM context overload through template-driven sectioning. **New Features:** 1. **YAML Template System** (`templates/documentation/proxmox.yaml`) - Define documentation sections independently - Specify data requirements per section - Configure prompts, generation settings, and scheduling - Prevents LLM context overflow by sectioning data 2. **Template-Based Generator** (`src/datacenter_docs/generators/template_generator.py`) - Load and parse YAML templates - Generate documentation sections independently - Extract only required data for each section - Save sections individually to files and database - Combine sections with table of contents 3. **Celery Tasks** (`src/datacenter_docs/workers/documentation_tasks.py`) - `collect_and_generate_docs`: Collect data and generate docs - `generate_proxmox_docs`: Scheduled Proxmox documentation (daily at 2 AM) - `generate_all_docs`: Generate docs for all systems in parallel - `index_generated_docs`: Index generated docs into vector store for RAG - `full_docs_pipeline`: Complete workflow (collect → generate → index) 4. **Scheduled Jobs** (updated `celery_app.py`) - Daily Proxmox documentation generation - Every 6 hours: all systems documentation - Weekly: full pipeline with indexing - Proper task routing and rate limiting 5. **Test Script** (`scripts/test_proxmox_docs.py`) - End-to-end testing of documentation generation - Mock data collection from Proxmox - Template-based generation - File and database storage 6. **Configuration Updates** (`src/datacenter_docs/utils/config.py`) - Add port configuration fields for Docker services - Add MongoDB and Redis credentials - Support all required environment variables **Proxmox Documentation Sections:** - Infrastructure Overview (cluster, nodes, stats) - Virtual Machines Inventory - LXC Containers Inventory - Storage Configuration - Network Configuration - Maintenance Procedures **Benefits:** - Scalable to multiple infrastructure systems - Prevents LLM context window overflow - Independent section generation - Scheduled automatic updates - Vector store integration for RAG chat - Template-driven approach for consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 19:23:30 +02:00
parent 27dd9e00b6
commit 16fc8e2659
6 changed files with 1178 additions and 1 deletions
--- a/src/datacenter_docs/generators/template_generator.py
+++ b/src/datacenter_docs/generators/template_generator.py
@@ -0,0 +1,409 @@
+"""
+Template-Based Documentation Generator
+
+Generates documentation using YAML templates that define sections and prompts.
+This approach prevents LLM context overload by generating documentation in sections.
+"""
+
+import json
+import logging
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import yaml
+
+from datacenter_docs.generators.base import BaseGenerator
+
+logger = logging.getLogger(__name__)
+
+
+class DocumentationTemplate:
+    """Represents a documentation template loaded from YAML"""
+
+    def __init__(self, template_path: Path):
+        """
+        Initialize template from YAML file
+
+        Args:
+            template_path: Path to YAML template file
+        """
+        self.path = template_path
+        self.data = self._load_template()
+
+    def _load_template(self) -> Dict[str, Any]:
+        """Load and parse YAML template"""
+        try:
+            with open(self.path, "r", encoding="utf-8") as f:
+                return yaml.safe_load(f)
+        except Exception as e:
+            logger.error(f"Failed to load template {self.path}: {e}")
+            raise
+
+    @property
+    def name(self) -> str:
+        """Get template name"""
+        return self.data.get("metadata", {}).get("name", "Unknown")
+
+    @property
+    def collector(self) -> str:
+        """Get required collector name"""
+        return self.data.get("metadata", {}).get("collector", "")
+
+    @property
+    def sections(self) -> List[Dict[str, Any]]:
+        """Get documentation sections"""
+        return self.data.get("sections", [])
+
+    @property
+    def generation_config(self) -> Dict[str, Any]:
+        """Get generation configuration"""
+        return self.data.get("generation", {})
+
+    @property
+    def output_config(self) -> Dict[str, Any]:
+        """Get output configuration"""
+        return self.data.get("output", {})
+
+    @property
+    def schedule_config(self) -> Dict[str, Any]:
+        """Get schedule configuration"""
+        return self.data.get("schedule", {})
+
+
+class TemplateBasedGenerator(BaseGenerator):
+    """
+    Generator that uses YAML templates to generate sectioned documentation
+
+    This prevents LLM context overload by:
+    1. Loading templates that define sections
+    2. Generating each section independently
+    3. Using only required data for each section
+    """
+
+    def __init__(self, template_path: str):
+        """
+        Initialize template-based generator
+
+        Args:
+            template_path: Path to YAML template file
+        """
+        self.template = DocumentationTemplate(Path(template_path))
+        super().__init__(
+            name=self.template.collector, section=f"{self.template.collector}_docs"
+        )
+
+    async def generate(self, data: Dict[str, Any]) -> str:
+        """
+        Generate complete documentation using template
+
+        This method orchestrates the generation of all sections.
+
+        Args:
+            data: Collected infrastructure data
+
+        Returns:
+            Combined documentation (all sections)
+        """
+        self.logger.info(
+            f"Generating documentation for {self.template.name} using template"
+        )
+
+        # Validate data matches template collector
+        collector_name = data.get("metadata", {}).get("collector", "")
+        if collector_name != self.template.collector:
+            self.logger.warning(
+                f"Data collector ({collector_name}) doesn't match template ({self.template.collector})"
+            )
+
+        # Generate each section
+        sections_content = []
+        for section_def in self.template.sections:
+            section_content = await self.generate_section(section_def, data)
+            if section_content:
+                sections_content.append(section_content)
+
+        # Combine all sections
+        combined_doc = self._combine_sections(sections_content)
+
+        return combined_doc
+
+    async def generate_section(
+        self, section_def: Dict[str, Any], full_data: Dict[str, Any]
+    ) -> Optional[str]:
+        """
+        Generate a single documentation section
+
+        Args:
+            section_def: Section definition from template
+            full_data: Complete collected data
+
+        Returns:
+            Generated section content in Markdown
+        """
+        section_id = section_def.get("id", "unknown")
+        section_title = section_def.get("title", "Untitled Section")
+        data_requirements = section_def.get("data_requirements", [])
+        prompt_template = section_def.get("prompt_template", "")
+
+        self.logger.info(f"Generating section: {section_title}")
+
+        # Extract only required data for this section
+        section_data = self._extract_section_data(full_data, data_requirements)
+
+        # Build prompt by substituting placeholders
+        prompt = self._build_prompt(prompt_template, section_data)
+
+        # Get generation config
+        gen_config = self.template.generation_config
+        temperature = gen_config.get("temperature", 0.7)
+        max_tokens = gen_config.get("max_tokens", 4000)
+
+        # System prompt for documentation generation
+        system_prompt = """You are a technical documentation expert specializing in datacenter infrastructure.
+Generate clear, accurate, and well-structured documentation in Markdown format.
+
+Guidelines:
+- Use proper Markdown formatting (headers, tables, lists, code blocks)
+- Be precise and factual based on provided data
+- Include practical examples and recommendations
+- Use tables for structured data
+- Use bullet points for lists
+- Use code blocks for commands/configurations
+- Organize content with clear sections
+- Write in a professional but accessible tone
+"""
+
+        try:
+            # Generate content using LLM
+            content = await self.generate_with_llm(
+                system_prompt=system_prompt,
+                user_prompt=prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+            )
+
+            # Add section header
+            section_content = f"# {section_title}\n\n{content}\n\n"
+
+            self.logger.info(f"✓ Section '{section_title}' generated successfully")
+            return section_content
+
+        except Exception as e:
+            self.logger.error(f"Failed to generate section '{section_title}': {e}")
+            return None
+
+    def _extract_section_data(
+        self, full_data: Dict[str, Any], requirements: List[str]
+    ) -> Dict[str, Any]:
+        """
+        Extract only required data for a section
+
+        Args:
+            full_data: Complete collected data
+            requirements: List of required data keys
+
+        Returns:
+            Dictionary with only required data
+        """
+        section_data = {}
+        data_section = full_data.get("data", {})
+
+        for req in requirements:
+            if req in data_section:
+                section_data[req] = data_section[req]
+            else:
+                self.logger.warning(f"Required data '{req}' not found in collected data")
+                section_data[req] = None
+
+        return section_data
+
+    def _build_prompt(self, template: str, data: Dict[str, Any]) -> str:
+        """
+        Build prompt by substituting data into template
+
+        Args:
+            template: Prompt template with {placeholders}
+            data: Data to substitute
+
+        Returns:
+            Completed prompt
+        """
+        prompt = template
+
+        # Replace each placeholder with formatted data
+        for key, value in data.items():
+            placeholder = f"{{{key}}}"
+            if placeholder in prompt:
+                # Format data for prompt
+                formatted_value = self._format_data_for_prompt(value)
+                prompt = prompt.replace(placeholder, formatted_value)
+
+        return prompt
+
+    def _format_data_for_prompt(self, data: Any) -> str:
+        """
+        Format data for inclusion in LLM prompt
+
+        Args:
+            data: Data to format (dict, list, str, etc.)
+
+        Returns:
+            Formatted string representation
+        """
+        if data is None:
+            return "No data available"
+
+        if isinstance(data, (dict, list)):
+            # Pretty print JSON for structured data
+            try:
+                return json.dumps(data, indent=2, default=str)
+            except Exception:
+                return str(data)
+
+        return str(data)
+
+    def _combine_sections(self, sections: List[str]) -> str:
+        """
+        Combine all sections into a single document
+
+        Args:
+            sections: List of section contents
+
+        Returns:
+            Combined markdown document
+        """
+        # Add document header
+        header = f"""# {self.template.name} Documentation
+
+*Generated automatically from infrastructure data*
+
+---
+
+"""
+
+        # Add table of contents
+        toc = "## Table of Contents\n\n"
+        for i, section in enumerate(sections, 1):
+            # Extract section title from first line
+            lines = section.strip().split("\n")
+            if lines:
+                title = lines[0].replace("#", "").strip()
+                toc += f"{i}. [{title}](#{title.lower().replace(' ', '-')})\n"
+        toc += "\n---\n\n"
+
+        # Combine all parts
+        combined = header + toc + "\n".join(sections)
+
+        return combined
+
+    async def generate_and_save_sections(
+        self, data: Dict[str, Any], save_individually: bool = True
+    ) -> List[Dict[str, Any]]:
+        """
+        Generate and save each section individually
+
+        This is useful for very large documentation where you want each
+        section as a separate file.
+
+        Args:
+            data: Collected infrastructure data
+            save_individually: Save each section as separate file
+
+        Returns:
+            List of results for each section
+        """
+        results = []
+        output_config = self.template.output_config
+        output_dir = output_config.get("directory", "output")
+        save_to_db = output_config.get("save_to_database", True)
+        save_to_file = output_config.get("save_to_file", True)
+
+        for section_def in self.template.sections:
+            section_id = section_def.get("id")
+            section_title = section_def.get("title")
+
+            # Generate section
+            content = await self.generate_section(section_def, data)
+
+            if not content:
+                results.append(
+                    {
+                        "section_id": section_id,
+                        "success": False,
+                        "error": "Generation failed",
+                    }
+                )
+                continue
+
+            result = {
+                "section_id": section_id,
+                "title": section_title,
+                "success": True,
+                "content": content,
+            }
+
+            # Save section if requested
+            if save_individually:
+                if save_to_file:
+                    # Save to file
+                    output_path = Path(output_dir)
+                    output_path.mkdir(parents=True, exist_ok=True)
+                    filename = f"{section_id}.md"
+                    file_path = output_path / filename
+                    file_path.write_text(content, encoding="utf-8")
+                    result["file_path"] = str(file_path)
+                    self.logger.info(f"Saved section to: {file_path}")
+
+                if save_to_db:
+                    # Save to database
+                    metadata = {
+                        "section_id": section_id,
+                        "template": str(self.template.path),
+                        "category": section_def.get("category", ""),
+                    }
+                    # Create temporary generator for this section
+                    temp_gen = BaseGenerator.__new__(BaseGenerator)
+                    temp_gen.name = self.name
+                    temp_gen.section = section_id
+                    temp_gen.logger = self.logger
+                    temp_gen.llm = self.llm
+                    await temp_gen.save_to_database(content, metadata)
+
+            results.append(result)
+
+        return results
+
+
+async def example_usage() -> None:
+    """Example of using template-based generator"""
+    from datacenter_docs.collectors.proxmox_collector import ProxmoxCollector
+
+    # Collect data
+    collector = ProxmoxCollector()
+    collect_result = await collector.run()
+
+    if not collect_result["success"]:
+        print(f"❌ Collection failed: {collect_result['error']}")
+        return
+
+    # Generate documentation using template
+    template_path = "templates/documentation/proxmox.yaml"
+    generator = TemplateBasedGenerator(template_path)
+
+    # Generate and save all sections
+    sections_results = await generator.generate_and_save_sections(
+        data=collect_result["data"], save_individually=True
+    )
+
+    # Print results
+    for result in sections_results:
+        if result["success"]:
+            print(f"✅ Section '{result['title']}' generated successfully")
+        else:
+            print(f"❌ Section '{result.get('section_id')}' failed: {result.get('error')}")
+
+
+if __name__ == "__main__":
+    import asyncio
+
+    asyncio.run(example_usage())
--- a/src/datacenter_docs/utils/config.py
+++ b/src/datacenter_docs/utils/config.py
@@ -72,6 +72,20 @@ class Settings(BaseSettings):
    CELERY_BROKER_URL: str = "redis://localhost:6379/0"
    CELERY_RESULT_BACKEND: str = "redis://localhost:6379/0"

+    # Additional Port Configuration (for Docker services)
+    MONGODB_PORT: int = 27017
+    REDIS_PORT: int = 6379
+    CHAT_PORT: int = 8001
+    FLOWER_PORT: int = 5555
+    FRONTEND_PORT: int = 8080
+
+    # MongoDB Root Credentials (for Docker initialization)
+    MONGO_ROOT_USER: str = "admin"
+    MONGO_ROOT_PASSWORD: str = "admin123"
+
+    # Redis Password
+    REDIS_PASSWORD: str = ""
+
    @model_validator(mode="before")
    @classmethod
    def set_celery_defaults(cls, values: Dict[str, Any]) -> Dict[str, Any]:
--- a/src/datacenter_docs/workers/celery_app.py
+++ b/src/datacenter_docs/workers/celery_app.py
@@ -35,6 +35,7 @@ celery_app = Celery(
    backend=settings.CELERY_RESULT_BACKEND,
    include=[
        "datacenter_docs.workers.tasks",
+        "datacenter_docs.workers.documentation_tasks",
    ],
 )

@@ -67,6 +68,11 @@ celery_app.conf.update(
            "queue": "data_collection"
        },
        "datacenter_docs.workers.tasks.cleanup_old_data_task": {"queue": "maintenance"},
+        "collect_and_generate_docs": {"queue": "documentation"},
+        "generate_proxmox_docs": {"queue": "documentation"},
+        "generate_all_docs": {"queue": "documentation"},
+        "index_generated_docs": {"queue": "documentation"},
+        "full_docs_pipeline": {"queue": "documentation"},
    },
    # Task rate limits
    task_annotations={
@@ -77,10 +83,28 @@ celery_app.conf.update(
    },
    # Beat schedule (periodic tasks)
    beat_schedule={
+        # Generate Proxmox documentation daily at 2 AM
+        "generate-proxmox-docs-daily": {
+            "task": "generate_proxmox_docs",
+            "schedule": crontab(minute=0, hour=2),  # Daily at 2 AM
+            "options": {"queue": "documentation"},
+        },
        # Generate all documentation every 6 hours
        "generate-all-docs-every-6h": {
+            "task": "generate_all_docs",
+            "schedule": crontab(minute=30, hour="*/6"),  # Every 6 hours at :30
+            "options": {"queue": "documentation"},
+        },
+        # Full documentation pipeline weekly
+        "full-docs-pipeline-weekly": {
+            "task": "full_docs_pipeline",
+            "schedule": crontab(minute=0, hour=3, day_of_week=0),  # Sunday at 3 AM
+            "options": {"queue": "documentation"},
+        },
+        # Legacy tasks (keep for backward compatibility)
+        "generate-all-docs-legacy": {
            "task": "datacenter_docs.workers.tasks.generate_documentation_task",
-            "schedule": crontab(minute=0, hour="*/6"),  # Every 6 hours
+            "schedule": crontab(minute=0, hour="*/12"),  # Every 12 hours
            "args": (),
            "options": {"queue": "documentation"},
        },
--- a/src/datacenter_docs/workers/documentation_tasks.py
+++ b/src/datacenter_docs/workers/documentation_tasks.py
@@ -0,0 +1,347 @@
+"""
+Celery Tasks for Documentation Generation
+
+Scheduled tasks for collecting data and generating documentation
+from infrastructure systems (Proxmox, VMware, Kubernetes, etc.)
+"""
+
+import logging
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Dict, List
+
+from celery import group
+
+from datacenter_docs.workers.celery_app import celery_app
+
+logger = logging.getLogger(__name__)
+
+
+@celery_app.task(name="collect_and_generate_docs", bind=True)
+def collect_and_generate_docs(
+    self, collector_name: str, template_path: str
+) -> Dict[str, Any]:
+    """
+    Collect data from infrastructure and generate documentation
+
+    Args:
+        collector_name: Name of collector to use (e.g., 'proxmox', 'vmware')
+        template_path: Path to documentation template YAML file
+
+    Returns:
+        Result dictionary with status and details
+    """
+    import asyncio
+
+    task_id = self.request.id
+    logger.info(
+        f"[{task_id}] Starting documentation generation: {collector_name} -> {template_path}"
+    )
+
+    result = {
+        "task_id": task_id,
+        "collector": collector_name,
+        "template": template_path,
+        "success": False,
+        "started_at": datetime.now().isoformat(),
+        "completed_at": None,
+        "error": None,
+        "sections_generated": 0,
+        "sections_failed": 0,
+    }
+
+    try:
+        # Run async collection and generation
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+
+        generation_result = loop.run_until_complete(
+            _async_collect_and_generate(collector_name, template_path)
+        )
+
+        loop.close()
+
+        # Update result
+        result.update(generation_result)
+        result["success"] = True
+        result["completed_at"] = datetime.now().isoformat()
+
+        logger.info(
+            f"[{task_id}] Documentation generation completed: "
+            f"{result['sections_generated']} sections generated, "
+            f"{result['sections_failed']} failed"
+        )
+
+    except Exception as e:
+        result["error"] = str(e)
+        result["completed_at"] = datetime.now().isoformat()
+        logger.error(f"[{task_id}] Documentation generation failed: {e}", exc_info=True)
+
+    return result
+
+
+async def _async_collect_and_generate(
+    collector_name: str, template_path: str
+) -> Dict[str, Any]:
+    """
+    Async implementation of collect and generate workflow
+
+    Args:
+        collector_name: Collector name
+        template_path: Template path
+
+    Returns:
+        Generation result
+    """
+    from datacenter_docs.generators.template_generator import TemplateBasedGenerator
+
+    # Import appropriate collector
+    collector = await _get_collector(collector_name)
+
+    # Collect data
+    logger.info(f"Collecting data with {collector_name} collector...")
+    collect_result = await collector.run()
+
+    if not collect_result["success"]:
+        raise Exception(f"Data collection failed: {collect_result.get('error')}")
+
+    collected_data = collect_result["data"]
+
+    # Generate documentation using template
+    logger.info(f"Generating documentation using template: {template_path}")
+    generator = TemplateBasedGenerator(template_path)
+
+    sections_results = await generator.generate_and_save_sections(
+        data=collected_data, save_individually=True
+    )
+
+    # Count successes and failures
+    sections_generated = sum(1 for r in sections_results if r.get("success"))
+    sections_failed = sum(1 for r in sections_results if not r.get("success"))
+
+    return {
+        "sections_generated": sections_generated,
+        "sections_failed": sections_failed,
+        "sections": sections_results,
+        "collector_stats": collect_result["data"].get("data", {}).get("statistics", {}),
+    }
+
+
+async def _get_collector(collector_name: str) -> Any:
+    """
+    Get collector instance by name
+
+    Args:
+        collector_name: Name of collector
+
+    Returns:
+        Collector instance
+    """
+    from datacenter_docs.collectors.kubernetes_collector import KubernetesCollector
+    from datacenter_docs.collectors.proxmox_collector import ProxmoxCollector
+    from datacenter_docs.collectors.vmware_collector import VMwareCollector
+
+    collectors = {
+        "proxmox": ProxmoxCollector,
+        "vmware": VMwareCollector,
+        "kubernetes": KubernetesCollector,
+    }
+
+    if collector_name not in collectors:
+        raise ValueError(
+            f"Unknown collector: {collector_name}. Available: {list(collectors.keys())}"
+        )
+
+    return collectors[collector_name]()
+
+
+@celery_app.task(name="generate_proxmox_docs")
+def generate_proxmox_docs() -> Dict[str, Any]:
+    """
+    Scheduled task to generate Proxmox documentation
+
+    This task is scheduled via Celery Beat to run daily.
+
+    Returns:
+        Task result
+    """
+    logger.info("Scheduled Proxmox documentation generation started")
+
+    template_path = "templates/documentation/proxmox.yaml"
+
+    return collect_and_generate_docs(collector_name="proxmox", template_path=template_path)
+
+
+@celery_app.task(name="generate_all_docs")
+def generate_all_docs() -> Dict[str, Any]:
+    """
+    Generate documentation for all configured systems
+
+    This creates parallel tasks for each system.
+
+    Returns:
+        Result with task IDs
+    """
+    logger.info("Starting documentation generation for all systems")
+
+    # Define all systems and their templates
+    systems = [
+        {"collector": "proxmox", "template": "templates/documentation/proxmox.yaml"},
+        # Add more as templates are created:
+        # {"collector": "vmware", "template": "templates/documentation/vmware.yaml"},
+        # {"collector": "kubernetes", "template": "templates/documentation/k8s.yaml"},
+    ]
+
+    # Create parallel tasks
+    task_group = group(
+        [
+            collect_and_generate_docs.s(system["collector"], system["template"])
+            for system in systems
+        ]
+    )
+
+    # Execute group
+    result = task_group.apply_async()
+
+    return {
+        "task_group_id": result.id,
+        "systems": len(systems),
+        "message": "Documentation generation started for all systems",
+    }
+
+
+@celery_app.task(name="index_generated_docs")
+def index_generated_docs(output_dir: str = "output") -> Dict[str, Any]:
+    """
+    Index all generated documentation into vector store for RAG
+
+    This task should run after documentation generation to make
+    the new docs searchable in the chat interface.
+
+    Args:
+        output_dir: Directory containing generated markdown files
+
+    Returns:
+        Indexing result
+    """
+    import asyncio
+
+    logger.info(f"Starting documentation indexing from {output_dir}")
+
+    result = {
+        "success": False,
+        "files_indexed": 0,
+        "chunks_created": 0,
+        "error": None,
+    }
+
+    try:
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+
+        index_result = loop.run_until_complete(_async_index_docs(output_dir))
+
+        loop.close()
+
+        result.update(index_result)
+        result["success"] = True
+
+        logger.info(
+            f"Documentation indexing completed: {result['files_indexed']} files, "
+            f"{result['chunks_created']} chunks"
+        )
+
+    except Exception as e:
+        result["error"] = str(e)
+        logger.error(f"Documentation indexing failed: {e}", exc_info=True)
+
+    return result
+
+
+async def _async_index_docs(output_dir: str) -> Dict[str, Any]:
+    """
+    Async implementation of documentation indexing
+
+    Args:
+        output_dir: Output directory with markdown files
+
+    Returns:
+        Indexing result
+    """
+    from datacenter_docs.chat.agent import DocumentationAgent
+
+    agent = DocumentationAgent()
+
+    # Index all markdown files in output directory
+    docs_path = Path(output_dir)
+
+    if not docs_path.exists():
+        raise FileNotFoundError(f"Output directory not found: {output_dir}")
+
+    await agent.index_documentation(docs_path)
+
+    # Count indexed files and chunks
+    # (This is a simplified version, actual implementation would track this better)
+    md_files = list(docs_path.glob("**/*.md"))
+    files_indexed = len(md_files)
+
+    # Estimate chunks (roughly 1000 chars per chunk, 200 overlap)
+    total_chars = sum(f.stat().st_size for f in md_files)
+    chunks_created = total_chars // 800  # Rough estimate
+
+    return {"files_indexed": files_indexed, "chunks_created": chunks_created}
+
+
+@celery_app.task(name="full_docs_pipeline")
+def full_docs_pipeline() -> Dict[str, Any]:
+    """
+    Full documentation pipeline: collect -> generate -> index
+
+    This is the master task that orchestrates the entire workflow.
+
+    Returns:
+        Pipeline result
+    """
+    logger.info("Starting full documentation pipeline")
+
+    # Step 1: Generate all documentation
+    generate_result = generate_all_docs()
+
+    # Step 2: Wait a bit for generation to complete, then index
+    # (In production, this would use Celery chains/chords for better coordination)
+    from celery import chain
+
+    pipeline = chain(
+        generate_all_docs.s(),
+        index_generated_docs.s("output"),
+    )
+
+    result = pipeline.apply_async()
+
+    return {
+        "pipeline_id": result.id,
+        "message": "Full documentation pipeline started",
+        "steps": ["generate_all_docs", "index_generated_docs"],
+    }
+
+
+# Periodic task configuration (if using Celery Beat)
+# Add to celery_app.py or separate beat configuration:
+"""
+from celery.schedules import crontab
+
+celery_app.conf.beat_schedule = {
+    'generate-proxmox-docs-daily': {
+        'task': 'generate_proxmox_docs',
+        'schedule': crontab(hour=2, minute=0),  # Daily at 2 AM
+    },
+    'generate-all-docs-daily': {
+        'task': 'generate_all_docs',
+        'schedule': crontab(hour=2, minute=30),  # Daily at 2:30 AM
+    },
+    'full-docs-pipeline-weekly': {
+        'task': 'full_docs_pipeline',
+        'schedule': crontab(hour=3, minute=0, day_of_week=0),  # Weekly on Sunday at 3 AM
+    },
+}
+"""