Skip to main content

Chapter 21: Document and Content Management

Overview

Document and content management systems are natural applications for Vektagraf's combined vector search and graph capabilities. This chapter demonstrates how to build sophisticated content management systems that support semantic search, automatic classification, version control, and fine-grained access control.

Learning Objectives

  • Build semantic search systems with embeddings and full-text search
  • Implement automatic content classification and tagging
  • Design version control and document lifecycle management
  • Create secure content access control patterns
  • Optimize content discovery and recommendation

Prerequisites

  • Completed Chapters 1-6 (Foundations and Core Features)
  • Understanding of vector embeddings and similarity search
  • Basic knowledge of content management concepts

Core Concepts

Semantic Search Architecture

Vektagraf excels at semantic search by treating document embeddings as first-class object properties:

class Document extends VektaObject {
  late String title;
  late String content;
  late List<double> contentEmbedding;    // Semantic vector
  late List<double> titleEmbedding;     // Title-specific vector
  late Map<String, dynamic> metadata;
  late List<String> tags;
  late String contentType;
}

Content Classification

Automatic classification uses both rule-based and ML-based approaches:

  1. Vector-Based Classification: Use embeddings to find similar content
  2. Hierarchical Tagging: Build tag hierarchies with graph relationships
  3. Content Analysis: Extract entities, topics, and sentiment
  4. User Behavior: Learn from user interactions and feedback

Practical Examples

Complete Document Management System

Let's build a comprehensive enterprise document management system:

1. Schema Definition

{
  "name": "DocumentManagementSystem",
  "version": "1.0.0",
  "objects": {
    "Document": {
      "properties": {
        "title": {"type": "string", "required": true},
        "content": {"type": "text", "searchable": true},
        "summary": {"type": "string"},
        "contentEmbedding": {
          "type": "vector",
          "dimensions": 384,
          "algorithm": "hnsw",
          "distance": "cosine"
        },
        "titleEmbedding": {
          "type": "vector", 
          "dimensions": 384,
          "algorithm": "hnsw",
          "distance": "cosine"
        },
        "metadata": {
          "type": "object",
          "properties": {
            "author": {"type": "string"},
            "department": {"type": "string"},
            "contentType": {"type": "string"},
            "language": {"type": "string", "default": "en"},
            "wordCount": {"type": "integer"},
            "readingTime": {"type": "integer"},
            "confidentialityLevel": {"type": "string", "enum": ["public", "internal", "confidential", "restricted"]}
          }
        },
        "tags": {"type": "array", "items": {"type": "string"}},
        "categories": {"type": "array", "items": {"type": "string"}},
        "status": {"type": "string", "enum": ["draft", "review", "published", "archived"], "default": "draft"},
        "version": {"type": "integer", "default": 1},
        "parentDocumentId": {"type": "string"},
        "createdAt": {"type": "datetime", "required": true},
        "updatedAt": {"type": "datetime", "required": true},
        "publishedAt": {"type": "datetime"},
        "expiresAt": {"type": "datetime"}
      }
    },
    "DocumentVersion": {
      "properties": {
        "documentId": {"type": "string", "required": true},
        "version": {"type": "integer", "required": true},
        "title": {"type": "string", "required": true},
        "content": {"type": "text"},
        "changes": {"type": "text"},
        "changeType": {"type": "string", "enum": ["major", "minor", "patch"]},
        "author": {"type": "string", "required": true},
        "createdAt": {"type": "datetime", "required": true}
      }
    },
    "Tag": {
      "properties": {
        "name": {"type": "string", "required": true, "unique": true},
        "description": {"type": "string"},
        "color": {"type": "string"},
        "category": {"type": "string"},
        "parentTagId": {"type": "string"},
        "usageCount": {"type": "integer", "default": 0},
        "createdAt": {"type": "datetime", "required": true}
      }
    },
    "Category": {
      "properties": {
        "name": {"type": "string", "required": true, "unique": true},
        "description": {"type": "string"},
        "parentCategoryId": {"type": "string"},
        "path": {"type": "string"},
        "level": {"type": "integer", "default": 0},
        "documentCount": {"type": "integer", "default": 0},
        "createdAt": {"type": "datetime", "required": true}
      }
    },
    "ContentAccess": {
      "properties": {
        "documentId": {"type": "string", "required": true},
        "userId": {"type": "string", "required": true},
        "accessType": {"type": "string", "enum": ["read", "write", "admin"], "required": true},
        "grantedBy": {"type": "string", "required": true},
        "grantedAt": {"type": "datetime", "required": true},
        "expiresAt": {"type": "datetime"},
        "conditions": {"type": "object"}
      }
    },
    "SearchQuery": {
      "properties": {
        "userId": {"type": "string", "required": true},
        "query": {"type": "string", "required": true},
        "queryEmbedding": {
          "type": "vector",
          "dimensions": 384,
          "algorithm": "hnsw",
          "distance": "cosine"
        },
        "filters": {"type": "object"},
        "results": {"type": "array", "items": {"type": "string"}},
        "resultCount": {"type": "integer"},
        "executionTime": {"type": "number"},
        "timestamp": {"type": "datetime", "required": true}
      }
    }
  },
  "relationships": {
    "DocumentVersions": {
      "from": "Document",
      "to": "DocumentVersion", 
      "type": "one_to_many",
      "foreignKey": "documentId"
    },
    "DocumentAccess": {
      "from": "Document",
      "to": "ContentAccess",
      "type": "one_to_many",
      "foreignKey": "documentId"
    },
    "TagHierarchy": {
      "from": "Tag",
      "to": "Tag",
      "type": "one_to_many",
      "foreignKey": "parentTagId"
    },
    "CategoryHierarchy": {
      "from": "Category",
      "to": "Category",
      "type": "one_to_many", 
      "foreignKey": "parentCategoryId"
    }
  }
}

2. Semantic Search Implementation

class SemanticSearchEngine {
  final VektaDatabase db;
  final EmbeddingService embeddingService;
  
  SemanticSearchEngine(this.db, this.embeddingService);
  
  /// Perform hybrid semantic and full-text search
  Future<SearchResults> search(
    String query, {
    String? userId,
    SearchFilters? filters,
    int limit = 20,
    double semanticWeight = 0.7,
  }) async {
    // Generate query embedding
    final queryEmbedding = await embeddingService.generateEmbedding(query);
    
    // Store search query for analytics
    await _storeSearchQuery(query, queryEmbedding, userId, filters);
    
    // Perform semantic search
    final semanticResults = await _performSemanticSearch(
      queryEmbedding, 
      filters, 
      limit * 2,
    );
    
    // Perform full-text search
    final textResults = await _performFullTextSearch(query, filters, limit * 2);
    
    // Combine and rank results
    final combinedResults = await _combineSearchResults(
      semanticResults,
      textResults,
      semanticWeight: semanticWeight,
    );
    
    // Apply access control
    final accessibleResults = await _filterByAccess(combinedResults, userId);
    
    // Apply business rules and boost factors
    final rankedResults = await _applyRankingRules(accessibleResults, query, userId);
    
    return SearchResults(
      query: query,
      results: rankedResults.take(limit).toList(),
      totalCount: rankedResults.length,
      executionTime: DateTime.now().millisecondsSinceEpoch,
      facets: await _generateFacets(rankedResults),
    );
  }
  
  /// Semantic search using vector similarity
  Future<List<SearchResult>> _performSemanticSearch(
    List<double> queryEmbedding,
    SearchFilters? filters,
    int limit,
  ) async {
    var query = db.documents.vectorSearch(
      'contentEmbedding',
      queryEmbedding,
      limit: limit,
      threshold: 0.5,
    );
    
    // Apply filters
    if (filters != null) {
      query = _applyFilters(query, filters);
    }
    
    final documents = await query.find();
    
    return documents.map((doc) => SearchResult(
      document: doc,
      score: _calculateSemanticScore(queryEmbedding, doc.contentEmbedding),
      matchType: 'semantic',
      highlights: [],
    )).toList();
  }
  
  /// Full-text search with highlighting
  Future<List<SearchResult>> _performFullTextSearch(
    String query,
    SearchFilters? filters,
    int limit,
  ) async {
    var searchQuery = db.documents.search(query, fields: ['title', 'content']);
    
    if (filters != null) {
      searchQuery = _applyFilters(searchQuery, filters);
    }
    
    final documents = await searchQuery.limit(limit).find();
    
    return documents.map((doc) => SearchResult(
      document: doc,
      score: _calculateTextScore(query, doc),
      matchType: 'full_text',
      highlights: _generateHighlights(query, doc),
    )).toList();
  }
  
  /// Combine semantic and text search results
  Future<List<SearchResult>> _combineSearchResults(
    List<SearchResult> semanticResults,
    List<SearchResult> textResults, {
    double semanticWeight = 0.7,
  }) async {
    final combined = <String, SearchResult>{};
    
    // Add semantic results
    for (final result in semanticResults) {
      combined[result.document.id] = result.copyWith(
        score: result.score * semanticWeight,
      );
    }
    
    // Add or merge text results
    for (final result in textResults) {
      final existing = combined[result.document.id];
      if (existing != null) {
        // Combine scores
        combined[result.document.id] = existing.copyWith(
          score: existing.score + (result.score * (1 - semanticWeight)),
          matchType: 'hybrid',
          highlights: [...existing.highlights, ...result.highlights],
        );
      } else {
        combined[result.document.id] = result.copyWith(
          score: result.score * (1 - semanticWeight),
        );
      }
    }
    
    // Sort by combined score
    final results = combined.values.toList()
      ..sort((a, b) => b.score.compareTo(a.score));
    
    return results;
  }
  
  /// Apply access control filtering
  Future<List<SearchResult>> _filterByAccess(
    List<SearchResult> results,
    String? userId,
  ) async {
    if (userId == null) {
      // Return only public documents
      return results.where((r) => 
          r.document.metadata['confidentialityLevel'] == 'public').toList();
    }
    
    final accessibleDocIds = await _getUserAccessibleDocuments(userId);
    
    return results.where((r) => 
        accessibleDocIds.contains(r.document.id) ||
        r.document.metadata['confidentialityLevel'] == 'public').toList();
  }
}

3. Automatic Content Classification

class ContentClassifier {
  final VektaDatabase db;
  final EmbeddingService embeddingService;
  final Map<String, List<double>> _categoryEmbeddings = {};
  
  ContentClassifier(this.db, this.embeddingService);
  
  /// Classify document using multiple approaches
  Future<ClassificationResult> classifyDocument(Document document) async {
    // Generate content embedding if not exists
    if (document.contentEmbedding.isEmpty) {
      document.contentEmbedding = await embeddingService.generateEmbedding(
        '${document.title}\n${document.content}',
      );
    }
    
    // Vector-based classification
    final vectorCategories = await _classifyByVector(document);
    
    // Rule-based classification
    final ruleCategories = await _classifyByRules(document);
    
    // Content analysis classification
    final analysisCategories = await _classifyByContentAnalysis(document);
    
    // Combine classifications
    final combinedCategories = _combineClassifications([
      vectorCategories,
      ruleCategories,
      analysisCategories,
    ]);
    
    // Generate tags
    final suggestedTags = await _generateTags(document, combinedCategories);
    
    return ClassificationResult(
      categories: combinedCategories,
      tags: suggestedTags,
      confidence: _calculateConfidence(combinedCategories),
      method: 'hybrid',
    );
  }
  
  /// Vector-based classification using similarity
  Future<List<CategoryScore>> _classifyByVector(Document document) async {
    await _ensureCategoryEmbeddings();
    
    final scores = <CategoryScore>[];
    
    for (final entry in _categoryEmbeddings.entries) {
      final similarity = _calculateCosineSimilarity(
        document.contentEmbedding,
        entry.value,
      );
      
      if (similarity > 0.6) {
        scores.add(CategoryScore(
          category: entry.key,
          score: similarity,
          method: 'vector',
        ));
      }
    }
    
    scores.sort((a, b) => b.score.compareTo(a.score));
    return scores.take(5).toList();
  }
  
  /// Rule-based classification using keywords and patterns
  Future<List<CategoryScore>> _classifyByRules(Document document) async {
    final rules = await _getClassificationRules();
    final scores = <CategoryScore>[];
    
    final text = '${document.title} ${document.content}'.toLowerCase();
    
    for (final rule in rules) {
      double score = 0.0;
      int matches = 0;
      
      // Check keywords
      for (final keyword in rule.keywords) {
        if (text.contains(keyword.toLowerCase())) {
          score += keyword.length > 5 ? 0.3 : 0.1; // Longer keywords get higher weight
          matches++;
        }
      }
      
      // Check patterns
      for (final pattern in rule.patterns) {
        final regex = RegExp(pattern, caseSensitive: false);
        final patternMatches = regex.allMatches(text).length;
        score += patternMatches * 0.2;
        matches += patternMatches;
      }
      
      if (score > 0.3) {
        scores.add(CategoryScore(
          category: rule.category,
          score: math.min(score, 1.0),
          method: 'rules',
          details: {'matches': matches, 'keywords': rule.keywords},
        ));
      }
    }
    
    scores.sort((a, b) => b.score.compareTo(a.score));
    return scores.take(3).toList();
  }
  
  /// Content analysis classification using NLP
  Future<List<CategoryScore>> _classifyByContentAnalysis(Document document) async {
    final analysis = await _analyzeContent(document.content);
    final scores = <CategoryScore>[];
    
    // Entity-based classification
    for (final entity in analysis.entities) {
      final category = await _mapEntityToCategory(entity);
      if (category != null) {
        scores.add(CategoryScore(
          category: category,
          score: entity.confidence * 0.8,
          method: 'entity_analysis',
          details: {'entity': entity.text, 'type': entity.type},
        ));
      }
    }
    
    // Topic-based classification
    for (final topic in analysis.topics) {
      scores.add(CategoryScore(
        category: topic.name,
        score: topic.confidence,
        method: 'topic_analysis',
        details: {'keywords': topic.keywords},
      ));
    }
    
    scores.sort((a, b) => b.score.compareTo(a.score));
    return scores.take(3).toList();
  }
  
  /// Generate relevant tags for document
  Future<List<TagSuggestion>> _generateTags(
    Document document,
    List<CategoryScore> categories,
  ) async {
    final suggestions = <TagSuggestion>[];
    
    // Extract entities as potential tags
    final analysis = await _analyzeContent(document.content);
    for (final entity in analysis.entities) {
      if (entity.confidence > 0.7) {
        suggestions.add(TagSuggestion(
          tag: entity.text.toLowerCase(),
          confidence: entity.confidence,
          source: 'entity_extraction',
        ));
      }
    }
    
    // Use category-based tags
    for (final category in categories.take(3)) {
      final categoryTags = await _getCategoryTags(category.category);
      for (final tag in categoryTags) {
        suggestions.add(TagSuggestion(
          tag: tag,
          confidence: category.score * 0.8,
          source: 'category_mapping',
        ));
      }
    }
    
    // Find similar documents and extract common tags
    final similarDocs = await db.documents
        .vectorSearch('contentEmbedding', document.contentEmbedding, limit: 10)
        .where('id', notEquals: document.id)
        .find();
    
    final tagFrequency = <String, int>{};
    for (final doc in similarDocs) {
      for (final tag in doc.tags) {
        tagFrequency[tag] = (tagFrequency[tag] ?? 0) + 1;
      }
    }
    
    for (final entry in tagFrequency.entries) {
      if (entry.value >= 3) { // Tag appears in at least 3 similar documents
        suggestions.add(TagSuggestion(
          tag: entry.key,
          confidence: (entry.value / similarDocs.length) * 0.9,
          source: 'similar_documents',
        ));
      }
    }
    
    // Remove duplicates and sort by confidence
    final uniqueTags = <String, TagSuggestion>{};
    for (final suggestion in suggestions) {
      final existing = uniqueTags[suggestion.tag];
      if (existing == null || suggestion.confidence > existing.confidence) {
        uniqueTags[suggestion.tag] = suggestion;
      }
    }
    
    final result = uniqueTags.values.toList()
      ..sort((a, b) => b.confidence.compareTo(a.confidence));
    
    return result.take(10).toList();
  }
}

4. Version Control and Document Lifecycle

class DocumentVersionManager {
  final VektaDatabase db;
  
  DocumentVersionManager(this.db);
  
  /// Create new version of document
  Future<DocumentVersion> createVersion(
    String documentId,
    String newContent, {
    String? newTitle,
    String? changes,
    String changeType = 'minor',
    required String author,
  }) async {
    final document = await db.documents.findById(documentId);
    if (document == null) throw Exception('Document not found');
    
    // Create version record for current state
    final currentVersion = DocumentVersion()
      ..documentId = documentId
      ..version = document.version
      ..title = document.title
      ..content = document.content
      ..changes = changes ?? 'Version ${document.version}'
      ..changeType = changeType
      ..author = author
      ..createdAt = DateTime.now();
    
    await db.documentVersions.save(currentVersion);
    
    // Update document with new content
    final oldTitle = document.title;
    document.title = newTitle ?? document.title;
    document.content = newContent;
    document.version = document.version + 1;
    document.updatedAt = DateTime.now();
    
    // Regenerate embeddings if content changed significantly
    if (_isSignificantChange(document.content, newContent)) {
      document.contentEmbedding = await embeddingService.generateEmbedding(newContent);
      
      if (newTitle != null && newTitle != oldTitle) {
        document.titleEmbedding = await embeddingService.generateEmbedding(newTitle);
      }
      
      // Reclassify document
      final classification = await contentClassifier.classifyDocument(document);
      document.categories = classification.categories.map((c) => c.category).toList();
      document.tags = classification.tags.map((t) => t.tag).toList();
    }
    
    await db.documents.save(document);
    
    // Create new version record
    final newVersion = DocumentVersion()
      ..documentId = documentId
      ..version = document.version
      ..title = document.title
      ..content = document.content
      ..changes = changes ?? 'Updated to version ${document.version}'
      ..changeType = changeType
      ..author = author
      ..createdAt = DateTime.now();
    
    await db.documentVersions.save(newVersion);
    
    return newVersion;
  }
  
  /// Get version history for document
  Future<List<DocumentVersion>> getVersionHistory(String documentId) async {
    return await db.documentVersions
        .where('documentId', documentId)
        .orderBy('version', descending: true)
        .find();
  }
  
  /// Restore document to specific version
  Future<Document> restoreVersion(
    String documentId,
    int version,
    String restoredBy,
  ) async {
    final targetVersion = await db.documentVersions
        .where('documentId', documentId)
        .where('version', version)
        .findFirst();
    
    if (targetVersion == null) {
      throw Exception('Version $version not found');
    }
    
    // Create new version with restored content
    return await createVersion(
      documentId,
      targetVersion.content,
      newTitle: targetVersion.title,
      changes: 'Restored from version $version',
      changeType: 'major',
      author: restoredBy,
    );
  }
  
  /// Compare two versions
  Future<VersionComparison> compareVersions(
    String documentId,
    int version1,
    int version2,
  ) async {
    final v1 = await db.documentVersions
        .where('documentId', documentId)
        .where('version', version1)
        .findFirst();
    
    final v2 = await db.documentVersions
        .where('documentId', documentId)
        .where('version', version2)
        .findFirst();
    
    if (v1 == null || v2 == null) {
      throw Exception('One or both versions not found');
    }
    
    return VersionComparison(
      version1: v1,
      version2: v2,
      titleDiff: _generateDiff(v1.title, v2.title),
      contentDiff: _generateDiff(v1.content, v2.content),
      similarity: _calculateSimilarity(v1.content, v2.content),
    );
  }
  
  bool _isSignificantChange(String oldContent, String newContent) {
    final similarity = _calculateSimilarity(oldContent, newContent);
    return similarity < 0.8; // Less than 80% similar
  }
}

5. Content Security and Access Control

class ContentSecurityManager {
  final VektaDatabase db;
  
  ContentSecurityManager(this.db);
  
  /// Check if user has access to document
  Future<bool> hasAccess(
    String userId,
    String documentId,
    String accessType,
  ) async {
    final document = await db.documents.findById(documentId);
    if (document == null) return false;
    
    // Check public access
    if (document.metadata['confidentialityLevel'] == 'public' && 
        accessType == 'read') {
      return true;
    }
    
    // Check explicit access grants
    final access = await db.contentAccess
        .where('documentId', documentId)
        .where('userId', userId)
        .where('accessType', accessType)
        .where('expiresAt', greaterThan: DateTime.now())
        .findFirst();
    
    if (access != null) {
      return _checkAccessConditions(access, userId);
    }
    
    // Check role-based access
    return await _checkRoleBasedAccess(userId, document, accessType);
  }
  
  /// Grant access to document
  Future<void> grantAccess(
    String documentId,
    String userId,
    String accessType,
    String grantedBy, {
    DateTime? expiresAt,
    Map<String, dynamic>? conditions,
  }) async {
    final access = ContentAccess()
      ..documentId = documentId
      ..userId = userId
      ..accessType = accessType
      ..grantedBy = grantedBy
      ..grantedAt = DateTime.now()
      ..expiresAt = expiresAt
      ..conditions = conditions ?? {};
    
    await db.contentAccess.save(access);
    
    // Log access grant
    await _logSecurityEvent('access_granted', {
      'documentId': documentId,
      'userId': userId,
      'accessType': accessType,
      'grantedBy': grantedBy,
    });
  }
  
  /// Revoke access to document
  Future<void> revokeAccess(
    String documentId,
    String userId,
    String accessType,
    String revokedBy,
  ) async {
    await db.contentAccess
        .where('documentId', documentId)
        .where('userId', userId)
        .where('accessType', accessType)
        .delete();
    
    // Log access revocation
    await _logSecurityEvent('access_revoked', {
      'documentId': documentId,
      'userId': userId,
      'accessType': accessType,
      'revokedBy': revokedBy,
    });
  }
  
  /// Apply content redaction based on user access level
  Future<Document> applyRedaction(Document document, String userId) async {
    final userClearanceLevel = await _getUserClearanceLevel(userId);
    final docConfidentiality = document.metadata['confidentialityLevel'];
    
    if (_canAccessFullContent(userClearanceLevel, docConfidentiality)) {
      return document;
    }
    
    // Create redacted copy
    final redacted = document.copy();
    
    // Redact sensitive content
    redacted.content = await _redactSensitiveContent(
      document.content,
      userClearanceLevel,
    );
    
    // Remove sensitive metadata
    redacted.metadata = _filterMetadata(document.metadata, userClearanceLevel);
    
    return redacted;
  }
  
  Future<String> _redactSensitiveContent(
    String content,
    String clearanceLevel,
  ) async {
    // Use pattern matching to identify and redact sensitive information
    final patterns = await _getSensitivityPatterns(clearanceLevel);
    
    String redactedContent = content;
    for (final pattern in patterns) {
      redactedContent = redactedContent.replaceAll(
        RegExp(pattern.regex),
        pattern.replacement,
      );
    }
    
    return redactedContent;
  }
}

Best Practices

1. Embedding Quality and Maintenance

class EmbeddingQualityManager {
  /// Ensure embedding quality through validation and updates
  Future<void> maintainEmbeddingQuality() async {
    // Find documents with outdated embeddings
    final outdatedDocs = await db.documents
        .where('updatedAt', greaterThan: 
               DateTime.now().subtract(Duration(days: 30)))
        .where('contentEmbedding', isEmpty: true)
        .find();
    
    for (final doc in outdatedDocs) {
      // Regenerate embeddings
      doc.contentEmbedding = await embeddingService.generateEmbedding(
        '${doc.title}\n${doc.content}',
      );
      doc.titleEmbedding = await embeddingService.generateEmbedding(doc.title);
      
      await db.documents.save(doc);
    }
    
    // Update category embeddings
    await _updateCategoryEmbeddings();
  }
  
  /// Validate embedding consistency
  Future<bool> validateEmbeddings() async {
    final documents = await db.documents.limit(100).find();
    
    for (final doc in documents) {
      // Check embedding dimensions
      if (doc.contentEmbedding.length != 384) {
        print('Invalid embedding dimensions for document ${doc.id}');
        return false;
      }
      
      // Check for NaN values
      if (doc.contentEmbedding.any((v) => v.isNaN)) {
        print('NaN values in embedding for document ${doc.id}');
        return false;
      }
      
      // Check embedding magnitude
      final magnitude = math.sqrt(
        doc.contentEmbedding.map((v) => v * v).reduce((a, b) => a + b),
      );
      
      if (magnitude < 0.1 || magnitude > 10.0) {
        print('Unusual embedding magnitude for document ${doc.id}: $magnitude');
      }
    }
    
    return true;
  }
}

2. Search Performance Optimization

class SearchOptimizer {
  final Map<String, SearchResults> _searchCache = {};
  final Duration _cacheExpiry = Duration(minutes: 15);
  
  /// Cache frequently used search results
  Future<SearchResults> getCachedSearch(
    String query,
    SearchFilters? filters,
    String? userId,
  ) async {
    final cacheKey = _generateCacheKey(query, filters, userId);
    
    if (_searchCache.containsKey(cacheKey)) {
      final cached = _searchCache[cacheKey]!;
      if (DateTime.now().difference(cached.timestamp) < _cacheExpiry) {
        return cached;
      }
    }
    
    // Perform fresh search
    final results = await searchEngine.search(query, 
        userId: userId, filters: filters);
    
    _searchCache[cacheKey] = results;
    return results;
  }
  
  /// Optimize search indexes
  Future<void> optimizeIndexes() async {
    // Rebuild vector indexes for better performance
    await db.documents.rebuildVectorIndex('contentEmbedding');
    await db.documents.rebuildVectorIndex('titleEmbedding');
    
    // Update full-text search indexes
    await db.documents.rebuildTextIndex(['title', 'content']);
    
    // Clean up old search queries
    await db.searchQueries
        .where('timestamp', lessThan: 
               DateTime.now().subtract(Duration(days: 30)))
        .delete();
  }
}

Advanced Topics

Content Recommendation Engine

class ContentRecommendationEngine {
  /// Recommend related documents
  Future<List<Document>> getRelatedDocuments(
    String documentId, {
    int limit = 10,
    String? userId,
  }) async {
    final document = await db.documents.findById(documentId);
    if (document == null) return [];
    
    // Find similar documents using content embedding
    final similar = await db.documents
        .vectorSearch('contentEmbedding', document.contentEmbedding, limit: limit * 2)
        .where('id', notEquals: documentId)
        .find();
    
    // Apply access control
    final accessible = <Document>[];
    for (final doc in similar) {
      if (await securityManager.hasAccess(userId ?? '', doc.id, 'read')) {
        accessible.add(doc);
      }
    }
    
    return accessible.take(limit).toList();
  }
  
  /// Recommend documents based on user reading history
  Future<List<Document>> getPersonalizedRecommendations(
    String userId, {
    int limit = 10,
  }) async {
    // Get user's reading history
    final searchHistory = await db.searchQueries
        .where('userId', userId)
        .orderBy('timestamp', descending: true)
        .limit(50)
        .find();
    
    if (searchHistory.isEmpty) {
      return _getPopularDocuments(limit);
    }
    
    // Calculate user interest vector
    final interestVector = _calculateUserInterestVector(searchHistory);
    
    // Find documents matching user interests
    final recommendations = await db.documents
        .vectorSearch('contentEmbedding', interestVector, limit: limit * 2)
        .find();
    
    // Filter by access and remove already seen
    final seenDocIds = searchHistory
        .expand((q) => q.results)
        .toSet();
    
    final filtered = <Document>[];
    for (final doc in recommendations) {
      if (!seenDocIds.contains(doc.id) &&
          await securityManager.hasAccess(userId, doc.id, 'read')) {
        filtered.add(doc);
      }
    }
    
    return filtered.take(limit).toList();
  }
}

Summary

This chapter demonstrated how to build comprehensive document and content management systems using Vektagraf's semantic search and security capabilities. Key takeaways include:

  • Semantic Search: Combine vector similarity with full-text search for better results
  • Automatic Classification: Use multiple approaches for accurate content categorization
  • Version Control: Implement comprehensive document lifecycle management
  • Security: Apply fine-grained access control and content redaction
  • Performance: Optimize search with caching and index management
  • Recommendations: Provide personalized content discovery

Vektagraf's vector-first approach makes it particularly well-suited for content management, as document embeddings are treated as first-class properties alongside traditional metadata.

Next Steps

  • Chapter 22: Social Networks and Graph Applications - Learn graph-based content relationships
  • Chapter 23: AI/ML Integration Patterns - Explore advanced ML integration for content analysis
  • Part VII: Reference documentation for complete API coverage
  • Vector Search Documentation (Chapter 5)
  • Security and Access Control (Chapters 8-10)
  • Performance Optimization (Chapter 7)
  • Multi-Tenant Architecture (Chapter 11)