# πŸƒ MongoDB Migration Guide ## PerchΓ© MongoDB? Il sistema Γ¨ stato aggiornato per utilizzare **MongoDB 7.0** invece di PostgreSQL per i seguenti motivi: ### βœ… Vantaggi per questo Use Case 1. **Schema Flessibile** - Ticket metadata variabili senza migration - Facile aggiunta di nuovi campi - Supporto nativo per documenti JSON complessi 2. **Performance** - Ottime performance per operazioni di lettura - Aggregation pipeline potente per analytics - Indexing flessibile su campi nested 3. **ScalabilitΓ ** - Horizontal scaling nativo (sharding) - Replica set per high availability - Auto-failover integrato 4. **Document-Oriented** - Match perfetto per ticket system - Metadata JSON nativi - Embedding di related docs senza JOIN 5. **Vector Search (Future)** - MongoDB Atlas Vector Search integrato - PossibilitΓ  di sostituire ChromaDB - Unified database per docs + vectors 6. **Developer Experience** - Beanie ODM moderno con Pydantic - Async/await nativo con Motor - Type hints e validazione ## πŸ”„ Architettura Database ### Collezioni Principali ``` datacenter_docs/ β”œβ”€β”€ tickets # Ticket e risoluzioni β”œβ”€β”€ documentation_sections # Metadata sezioni doc β”œβ”€β”€ chat_sessions # Conversazioni chat β”œβ”€β”€ system_metrics # Metriche sistema └── audit_logs # Audit trail ``` ### Schema Ticket (Example) ```json { "_id": ObjectId("..."), "ticket_id": "INC-12345", "title": "Network connectivity issue", "description": "Cannot ping 10.0.20.5 from VLAN 100", "priority": "high", "category": "network", "status": "resolved", "resolution": "Check VLAN configuration...", "suggested_actions": [ "Verify VLAN 100 on switch", "Check inter-VLAN routing" ], "related_docs": [ { "section": "networking", "content": "VLAN configuration...", "source": "/docs/02_networking.md" } ], "confidence_score": 0.92, "processing_time": 2.34, "metadata": { "source_system": "ServiceNow", "tags": ["network", "vlan", "connectivity"], "custom_field": "any value" }, "created_at": ISODate("2025-01-15T10:30:00Z"), "updated_at": ISODate("2025-01-15T10:30:02Z") } ``` ## πŸš€ Migration da PostgreSQL ### Step 1: Export dati esistenti (se presenti) ```bash # Export tickets da PostgreSQL psql -U docs_user -d datacenter_docs -c \ "COPY (SELECT * FROM tickets) TO '/tmp/tickets.csv' CSV HEADER" ``` ### Step 2: Import in MongoDB ```python import pandas as pd from motor.motor_asyncio import AsyncIOMotorClient import asyncio async def migrate(): # Leggi CSV df = pd.read_csv('/tmp/tickets.csv') # Connetti MongoDB client = AsyncIOMotorClient('mongodb://admin:password@localhost:27017') db = client.datacenter_docs # Insert documents tickets = df.to_dict('records') await db.tickets.insert_many(tickets) print(f"Migrated {len(tickets)} tickets") asyncio.run(migrate()) ``` ### Step 3: Verifica ```bash # Connetti a MongoDB mongosh mongodb://admin:password@localhost:27017 use datacenter_docs # Conta documenti db.tickets.countDocuments() # Query esempio db.tickets.find({status: "resolved"}).limit(5) ``` ## πŸ“¦ Setup Locale ### Docker Compose ```bash # Start MongoDB docker-compose up -d mongodb redis # Verifica connessione docker-compose exec mongodb mongosh \ -u admin -p password --authenticationDatabase admin # Test query use datacenter_docs db.tickets.find().limit(1) ``` ### Kubernetes ```bash # Deploy MongoDB StatefulSet kubectl apply -f deploy/kubernetes/mongodb.yaml # Wait for pods kubectl get pods -n datacenter-docs -w # Initialize replica set kubectl apply -f deploy/kubernetes/mongodb.yaml # Verify kubectl exec -n datacenter-docs mongodb-0 -- \ mongosh -u admin -p password --authenticationDatabase admin \ --eval "rs.status()" ``` ## πŸ”§ Configurazione ### Connection String ```bash # Development (local) MONGODB_URL=mongodb://admin:password@localhost:27017 # Docker Compose MONGODB_URL=mongodb://admin:password@mongodb:27017 # Kubernetes (single node) MONGODB_URL=mongodb://admin:password@mongodb.datacenter-docs.svc.cluster.local:27017 # Kubernetes (replica set) MONGODB_URL=mongodb://admin:password@mongodb-0.mongodb.datacenter-docs.svc.cluster.local:27017,mongodb-1.mongodb.datacenter-docs.svc.cluster.local:27017,mongodb-2.mongodb.datacenter-docs.svc.cluster.local:27017/?replicaSet=rs0 ``` ### Environment Variables ```bash # MongoDB MONGODB_URL=mongodb://admin:password@mongodb:27017 MONGODB_DATABASE=datacenter_docs # MongoDB Root (for admin operations) MONGO_ROOT_USER=admin MONGO_ROOT_PASSWORD=secure_password ``` ## πŸ” Security ### Authentication ```bash # Create application user mongosh -u admin -p password --authenticationDatabase admin use datacenter_docs db.createUser({ user: "docs_app", pwd: "app_password", roles: [ { role: "readWrite", db: "datacenter_docs" } ] }) # Use app user in connection string MONGODB_URL=mongodb://docs_app:app_password@mongodb:27017/datacenter_docs ``` ### Encryption at Rest ```yaml # docker-compose.yml mongodb: command: - --enableEncryption - --encryptionKeyFile=/data/mongodb-keyfile volumes: - ./mongodb-keyfile:/data/mongodb-keyfile:ro ``` ### TLS/SSL ```bash # Generate certificates openssl req -newkey rsa:2048 -nodes -keyout mongodb.key \ -x509 -days 365 -out mongodb.crt # Configure MongoDB mongodb: command: - --tlsMode=requireTLS - --tlsCertificateKeyFile=/etc/ssl/mongodb.pem ``` ## πŸ“Š Indexing Strategy ### Automatic Indexes (via Beanie) ```python class Ticket(Document): ticket_id: Indexed(str, unique=True) # Unique index status: str # Indexed in Settings class Settings: indexes = [ "status", "category", [("status", 1), ("created_at", -1)], # Compound ] ``` ### Custom Indexes ```javascript // Text search db.tickets.createIndex({ title: "text", description: "text", resolution: "text" }) // Geospatial (future use) db.locations.createIndex({ location: "2dsphere" }) // TTL index (auto-delete old docs) db.chat_sessions.createIndex( { last_activity: 1 }, { expireAfterSeconds: 2592000 } // 30 days ) ``` ## πŸ” Query Examples ### Python (Beanie) ```python from datacenter_docs.api.models import Ticket # Find by status tickets = await Ticket.find(Ticket.status == "resolved").to_list() # Complex query from datetime import datetime, timedelta recent = datetime.now() - timedelta(days=7) tickets = await Ticket.find( Ticket.status == "resolved", Ticket.confidence_score > 0.8, Ticket.created_at > recent ).sort(-Ticket.created_at).to_list() # Aggregation pipeline = [ {"$group": { "_id": "$category", "count": {"$sum": 1}, "avg_confidence": {"$avg": "$confidence_score"} }}, {"$sort": {"count": -1}} ] result = await Ticket.aggregate(pipeline).to_list() ``` ### MongoDB Shell ```javascript // Find resolved tickets db.tickets.find({ status: "resolved" }) // Complex aggregation db.tickets.aggregate([ { $match: { status: "resolved" } }, { $group: { _id: "$category", total: { $sum: 1 }, avg_confidence: { $avg: "$confidence_score" }, avg_time: { $avg: "$processing_time" } }}, { $sort: { total: -1 } } ]) // Text search db.tickets.find({ $text: { $search: "network connectivity" } }) ``` ## πŸ“ˆ Performance Optimization ### Indexes ```javascript // Explain query db.tickets.find({ status: "resolved" }).explain("executionStats") // Check index usage db.tickets.aggregate([ { $indexStats: {} } ]) ``` ### Connection Pooling ```python # config.py MONGODB_URL = "mongodb://user:pass@host:27017/?maxPoolSize=50" ``` ### Read Preference ```python # For read-heavy workloads with replica set from pymongo import ReadPreference client = AsyncIOMotorClient( MONGODB_URL, readPreference=ReadPreference.SECONDARY_PREFERRED ) ``` ## πŸ› οΈ Maintenance ### Backup ```bash # Full backup mongodump --uri="mongodb://admin:password@localhost:27017" \ --authenticationDatabase=admin \ --out=/backup/$(date +%Y%m%d) # Restore mongorestore --uri="mongodb://admin:password@localhost:27017" \ --authenticationDatabase=admin \ /backup/20250115 ``` ### Monitoring ```javascript // Database stats db.stats() // Collection stats db.tickets.stats() // Current operations db.currentOp() // Server status db.serverStatus() ``` ### Cleanup ```javascript // Remove old chat sessions db.chat_sessions.deleteMany({ last_activity: { $lt: new Date(Date.now() - 30*24*60*60*1000) } }) // Compact collection db.runCommand({ compact: "tickets" }) ``` ## πŸ”„ Replica Set (Production) ### Setup ```bash # Initialize replica set rs.initiate({ _id: "rs0", members: [ { _id: 0, host: "mongodb-0:27017" }, { _id: 1, host: "mongodb-1:27017" }, { _id: 2, host: "mongodb-2:27017" } ] }) # Check status rs.status() # Add member rs.add("mongodb-3:27017") ``` ### Connection String ```bash MONGODB_URL=mongodb://user:pass@mongodb-0:27017,mongodb-1:27017,mongodb-2:27017/?replicaSet=rs0&w=majority ``` ## πŸ“š References - [MongoDB Manual](https://docs.mongodb.com/manual/) - [Motor Documentation](https://motor.readthedocs.io/) - [Beanie ODM](https://beanie-odm.dev/) - [MongoDB Best Practices](https://docs.mongodb.com/manual/administration/production-notes/) --- **MongoDB Version**: 7.0 **Driver**: Motor (Async) **ODM**: Beanie **Python**: 3.10+