Chapter 20: Scaling and High Availability
Overview
This chapter focuses on scaling strategies and high availability patterns for Vektagraf applications. We'll explore horizontal and vertical scaling approaches, clustering and replication configurations, load balancing mechanisms, and comprehensive disaster recovery strategies to ensure your Vektagraf deployment can handle growth and maintain uptime.
Learning Objectives
- Implement horizontal and vertical scaling strategies
- Configure clustering and replication for high availability
- Set up load balancing and failover mechanisms
- Design disaster recovery and backup strategies
Prerequisites
- Understanding of Vektagraf architecture and deployment patterns
- Familiarity with Kubernetes and container orchestration
- Knowledge of database clustering concepts
- Basic understanding of network load balancing
Core Concepts
Scaling Fundamentals
Vektagraf supports multiple scaling approaches depending on your architecture:
Embedded Mode Scaling
For applications using embedded Vektagraf, scaling happens at the application level:
// lib/scaling/embedded_scaling.dart
class EmbeddedScalingConfig {
final int maxConnections;
final int workerThreads;
final int vectorIndexShards;
final Duration connectionTimeout;
const EmbeddedScalingConfig({
this.maxConnections = 100,
this.workerThreads = 4,
this.vectorIndexShards = 8,
this.connectionTimeout = Duration(seconds: 30),
});
Map<String, dynamic> toJson() => {
'database': {
'maxConnections': maxConnections,
'workerThreads': workerThreads,
'connectionTimeout': '${connectionTimeout.inSeconds}s',
},
'vector': {
'indexShards': vectorIndexShards,
'parallelIndexing': true,
},
};
}
Hosted Mode Scaling
For hosted deployments, scaling involves both client and server components:
// lib/scaling/hosted_scaling.dart
class HostedScalingConfig {
final List<String> serverEndpoints;
final int connectionPoolSize;
final LoadBalancingStrategy loadBalancing;
final bool enableFailover;
const HostedScalingConfig({
required this.serverEndpoints,
this.connectionPoolSize = 50,
this.loadBalancing = LoadBalancingStrategy.roundRobin,
this.enableFailover = true,
});
}
enum LoadBalancingStrategy {
roundRobin,
leastConnections,
weighted,
consistentHash,
}
Horizontal Scaling Strategies
Application-Level Horizontal Scaling
Kubernetes Horizontal Pod Autoscaler (HPA)
# hpa-config.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: vektagraf-hpa
namespace: vektagraf-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: vektagraf-app
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: vektagraf_active_connections
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 5
periodSeconds: 60
selectPolicy: Max
Custom Metrics for Scaling
// lib/metrics/custom_metrics.dart
class VektagrafMetrics {
static final _activeConnections = Gauge(
'vektagraf_active_connections',
'Number of active database connections',
);
static final _queryLatency = Histogram(
'vektagraf_query_duration_seconds',
'Query execution time in seconds',
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0],
);
static final _vectorSearchLatency = Histogram(
'vektagraf_vector_search_duration_seconds',
'Vector search execution time in seconds',
buckets: [0.01, 0.05, 0.1, 0.5, 1.0, 2.0, 5.0],
);
static void recordActiveConnections(int count) {
_activeConnections.set(count.toDouble());
}
static void recordQueryLatency(Duration duration) {
_queryLatency.observe(duration.inMicroseconds / 1000000.0);
}
static void recordVectorSearchLatency(Duration duration) {
_vectorSearchLatency.observe(duration.inMicroseconds / 1000000.0);
}
}
Prometheus Metrics Configuration
# prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "vektagraf_rules.yml"
scrape_configs:
- job_name: 'vektagraf'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- vektagraf-production
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
---
apiVersion: v1
kind: ConfigMap
metadata:
name: vektagraf-rules
namespace: monitoring
data:
vektagraf_rules.yml: |
groups:
- name: vektagraf.rules
rules:
- alert: VektagrafHighLatency
expr: histogram_quantile(0.95, vektagraf_query_duration_seconds) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "Vektagraf query latency is high"
description: "95th percentile latency is {{ $value }}s"
- alert: VektagrafHighConnectionCount
expr: vektagraf_active_connections > 80
for: 2m
labels:
severity: warning
annotations:
summary: "Vektagraf connection count is high"
description: "Active connections: {{ $value }}"
Database-Level Horizontal Scaling
Vektagraf Cluster Configuration
// lib/cluster/cluster_config.dart
class VektagrafClusterConfig {
final String nodeId;
final List<String> clusterNodes;
final ReplicationStrategy replication;
final ShardingStrategy sharding;
final ConsistencyLevel consistencyLevel;
const VektagrafClusterConfig({
required this.nodeId,
required this.clusterNodes,
this.replication = const ReplicationStrategy.async(replicas: 2),
this.sharding = const ShardingStrategy.hash(),
this.consistencyLevel = ConsistencyLevel.eventual,
});
Map<String, dynamic> toJson() => {
'cluster': {
'nodeId': nodeId,
'nodes': clusterNodes,
'replication': replication.toJson(),
'sharding': sharding.toJson(),
'consistencyLevel': consistencyLevel.name,
},
};
}
class ReplicationStrategy {
final ReplicationType type;
final int replicas;
final Duration syncTimeout;
const ReplicationStrategy.async({
this.replicas = 2,
this.syncTimeout = const Duration(seconds: 5),
}) : type = ReplicationType.async;
const ReplicationStrategy.sync({
this.replicas = 2,
this.syncTimeout = const Duration(seconds: 10),
}) : type = ReplicationType.sync;
Map<String, dynamic> toJson() => {
'type': type.name,
'replicas': replicas,
'syncTimeout': '${syncTimeout.inSeconds}s',
};
}
enum ReplicationType { async, sync }
enum ConsistencyLevel { strong, eventual, session }
Cluster Deployment Configuration
# cluster-deployment.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vektagraf-cluster
namespace: vektagraf-production
spec:
serviceName: vektagraf-cluster-headless
replicas: 3
selector:
matchLabels:
app: vektagraf-cluster
template:
metadata:
labels:
app: vektagraf-cluster
spec:
containers:
- name: vektagraf
image: vektagraf/server:latest
ports:
- containerPort: 9090
name: grpc
- containerPort: 8080
name: http
env:
- name: VEKTAGRAF_NODE_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: VEKTAGRAF_CLUSTER_NODES
value: "vektagraf-cluster-0.vektagraf-cluster-headless:9090,vektagraf-cluster-1.vektagraf-cluster-headless:9090,vektagraf-cluster-2.vektagraf-cluster-headless:9090"
- name: VEKTAGRAF_CLUSTER_MODE
value: "distributed"
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "8Gi"
cpu: "2000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
name: vektagraf-cluster-headless
namespace: vektagraf-production
spec:
clusterIP: None
selector:
app: vektagraf-cluster
ports:
- port: 9090
name: grpc
- port: 8080
name: http
Vertical Scaling Strategies
Vertical Pod Autoscaler (VPA)
# vpa-config.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: vektagraf-vpa
namespace: vektagraf-production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: vektagraf-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: vektagraf
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4000m
memory: 16Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
Resource Optimization
// lib/optimization/resource_optimizer.dart
class ResourceOptimizer {
static VektagrafConfig optimizeForWorkload(WorkloadProfile profile) {
switch (profile.type) {
case WorkloadType.readHeavy:
return _optimizeForReads(profile);
case WorkloadType.writeHeavy:
return _optimizeForWrites(profile);
case WorkloadType.vectorSearch:
return _optimizeForVectorSearch(profile);
case WorkloadType.graphTraversal:
return _optimizeForGraphOperations(profile);
case WorkloadType.mixed:
return _optimizeForMixed(profile);
}
}
static VektagrafConfig _optimizeForReads(WorkloadProfile profile) {
return VektagrafConfig(
database: DatabaseConfig(
maxConnections: profile.expectedConcurrency * 2,
readOnlyConnections: profile.expectedConcurrency,
queryCache: CacheConfig(
enabled: true,
maxSize: '512MB',
ttl: Duration(minutes: 30),
),
),
vector: VectorConfig(
algorithm: VectorAlgorithm.hnsw,
efSearch: 200, // Higher for better recall
cacheSize: '1GB',
),
);
}
static VektagrafConfig _optimizeForVectorSearch(WorkloadProfile profile) {
return VektagrafConfig(
vector: VectorConfig(
algorithm: VectorAlgorithm.hnsw,
efConstruction: 400,
maxConnections: 32,
efSearch: 100,
parallelSearch: true,
indexShards: _calculateOptimalShards(profile.dataSize),
),
database: DatabaseConfig(
maxConnections: profile.expectedConcurrency,
vectorIndexMemory: '4GB',
),
);
}
static int _calculateOptimalShards(int dataSize) {
// Rule of thumb: 1 shard per 1M vectors
return (dataSize / 1000000).ceil().clamp(1, 16);
}
}
class WorkloadProfile {
final WorkloadType type;
final int expectedConcurrency;
final int dataSize;
final double readWriteRatio;
final List<QueryPattern> queryPatterns;
const WorkloadProfile({
required this.type,
required this.expectedConcurrency,
required this.dataSize,
this.readWriteRatio = 0.8,
this.queryPatterns = const [],
});
}
enum WorkloadType {
readHeavy,
writeHeavy,
vectorSearch,
graphTraversal,
mixed,
}
Load Balancing and Failover
Application Load Balancer Configuration
# alb-config.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: vektagraf-alb
namespace: vektagraf-production
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/load-balancer-name: vektagraf-alb
alb.ingress.kubernetes.io/healthcheck-path: /health
alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '5'
alb.ingress.kubernetes.io/healthy-threshold-count: '2'
alb.ingress.kubernetes.io/unhealthy-threshold-count: '3'
alb.ingress.kubernetes.io/success-codes: '200'
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:region:account:certificate/cert-id
spec:
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: vektagraf-service
port:
number: 80
Client-Side Load Balancing
// lib/client/load_balancer.dart
class VektagrafLoadBalancer {
final List<VektagrafClient> _clients;
final LoadBalancingStrategy _strategy;
final HealthChecker _healthChecker;
int _currentIndex = 0;
VektagrafLoadBalancer({
required List<String> endpoints,
LoadBalancingStrategy strategy = LoadBalancingStrategy.roundRobin,
Duration healthCheckInterval = const Duration(seconds: 30),
}) : _clients = endpoints.map((e) => VektagrafClient(endpoint: e)).toList(),
_strategy = strategy,
_healthChecker = HealthChecker(
clients: _clients,
interval: healthCheckInterval,
);
Future<void> initialize() async {
await _healthChecker.start();
}
VektagrafClient getClient() {
final healthyClients = _healthChecker.healthyClients;
if (healthyClients.isEmpty) {
throw Exception('No healthy clients available');
}
switch (_strategy) {
case LoadBalancingStrategy.roundRobin:
return _roundRobin(healthyClients);
case LoadBalancingStrategy.leastConnections:
return _leastConnections(healthyClients);
case LoadBalancingStrategy.weighted:
return _weighted(healthyClients);
case LoadBalancingStrategy.consistentHash:
return _consistentHash(healthyClients);
}
}
VektagrafClient _roundRobin(List<VektagrafClient> clients) {
final client = clients[_currentIndex % clients.length];
_currentIndex++;
return client;
}
VektagrafClient _leastConnections(List<VektagrafClient> clients) {
return clients.reduce((a, b) =>
a.activeConnections < b.activeConnections ? a : b);
}
}
class HealthChecker {
final List<VektagrafClient> _clients;
final Duration _interval;
final Set<VektagrafClient> _healthyClients = {};
Timer? _timer;
HealthChecker({
required List<VektagrafClient> clients,
required Duration interval,
}) : _clients = clients, _interval = interval;
List<VektagrafClient> get healthyClients => _healthyClients.toList();
Future<void> start() async {
await _checkHealth();
_timer = Timer.periodic(_interval, (_) => _checkHealth());
}
Future<void> _checkHealth() async {
final futures = _clients.map((client) async {
try {
await client.ping();
_healthyClients.add(client);
} catch (e) {
_healthyClients.remove(client);
print('Health check failed for ${client.endpoint}: $e');
}
});
await Future.wait(futures);
}
void stop() {
_timer?.cancel();
}
}
Circuit Breaker Pattern
// lib/resilience/circuit_breaker.dart
class CircuitBreaker {
final String name;
final int failureThreshold;
final Duration timeout;
final Duration resetTimeout;
CircuitBreakerState _state = CircuitBreakerState.closed;
int _failureCount = 0;
DateTime? _lastFailureTime;
CircuitBreaker({
required this.name,
this.failureThreshold = 5,
this.timeout = const Duration(seconds: 60),
this.resetTimeout = const Duration(seconds: 30),
});
Future<T> execute<T>(Future<T> Function() operation) async {
if (_state == CircuitBreakerState.open) {
if (_shouldAttemptReset()) {
_state = CircuitBreakerState.halfOpen;
} else {
throw CircuitBreakerOpenException('Circuit breaker $name is open');
}
}
try {
final result = await operation().timeout(timeout);
_onSuccess();
return result;
} catch (e) {
_onFailure();
rethrow;
}
}
void _onSuccess() {
_failureCount = 0;
_state = CircuitBreakerState.closed;
}
void _onFailure() {
_failureCount++;
_lastFailureTime = DateTime.now();
if (_failureCount >= failureThreshold) {
_state = CircuitBreakerState.open;
}
}
bool _shouldAttemptReset() {
return _lastFailureTime != null &&
DateTime.now().difference(_lastFailureTime!) > resetTimeout;
}
}
enum CircuitBreakerState { closed, open, halfOpen }
class CircuitBreakerOpenException implements Exception {
final String message;
CircuitBreakerOpenException(this.message);
@override
String toString() => 'CircuitBreakerOpenException: $message';
}
Clustering and Replication
Master-Slave Replication
// lib/replication/master_slave.dart
class MasterSlaveReplication {
final VektagrafServer master;
final List<VektagrafServer> slaves;
final ReplicationConfig config;
MasterSlaveReplication({
required this.master,
required this.slaves,
required this.config,
});
Future<void> initialize() async {
// Initialize master
await master.start();
// Initialize slaves and set up replication
for (final slave in slaves) {
await slave.start();
await _setupReplication(slave);
}
}
Future<void> _setupReplication(VektagrafServer slave) async {
final replicationStream = master.getReplicationStream();
await for (final event in replicationStream) {
try {
await slave.applyReplicationEvent(event);
} catch (e) {
print('Replication failed for slave ${slave.nodeId}: $e');
// Implement retry logic or mark slave as unhealthy
}
}
}
Future<void> promoteSlaveToMaster(VektagrafServer newMaster) async {
// Stop accepting writes on current master
await master.setReadOnly(true);
// Ensure new master is caught up
await _waitForReplicationSync(newMaster);
// Promote slave to master
await newMaster.promoteToMaster();
// Update other slaves to replicate from new master
for (final slave in slaves) {
if (slave != newMaster) {
await slave.changeMaster(newMaster.endpoint);
}
}
}
Future<void> _waitForReplicationSync(VektagrafServer slave) async {
const maxWait = Duration(seconds: 30);
const checkInterval = Duration(milliseconds: 100);
final stopwatch = Stopwatch()..start();
while (stopwatch.elapsed < maxWait) {
final masterPosition = await master.getReplicationPosition();
final slavePosition = await slave.getReplicationPosition();
if (masterPosition == slavePosition) {
return;
}
await Future.delayed(checkInterval);
}
throw Exception('Replication sync timeout');
}
}
Multi-Master Replication
// lib/replication/multi_master.dart
class MultiMasterReplication {
final List<VektagrafServer> masters;
final ConflictResolutionStrategy conflictResolution;
MultiMasterReplication({
required this.masters,
this.conflictResolution = ConflictResolutionStrategy.lastWriteWins,
});
Future<void> initialize() async {
// Start all masters
for (final master in masters) {
await master.start();
}
// Set up bidirectional replication
for (int i = 0; i < masters.length; i++) {
for (int j = 0; j < masters.length; j++) {
if (i != j) {
await _setupBidirectionalReplication(masters[i], masters[j]);
}
}
}
}
Future<void> _setupBidirectionalReplication(
VektagrafServer source,
VektagrafServer target,
) async {
final replicationStream = source.getReplicationStream();
await for (final event in replicationStream) {
try {
await target.applyReplicationEvent(event, conflictResolution);
} catch (e) {
print('Replication failed from ${source.nodeId} to ${target.nodeId}: $e');
}
}
}
}
enum ConflictResolutionStrategy {
lastWriteWins,
firstWriteWins,
manual,
vectorClock,
}
Kubernetes Operator for Cluster Management
# vektagraf-operator.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: vektagrafclusters.vektagraf.io
spec:
group: vektagraf.io
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
replicas:
type: integer
minimum: 1
maximum: 100
replication:
type: object
properties:
strategy:
type: string
enum: ["master-slave", "multi-master"]
syncMode:
type: string
enum: ["async", "sync"]
resources:
type: object
properties:
cpu:
type: string
memory:
type: string
storage:
type: string
status:
type: object
properties:
phase:
type: string
readyReplicas:
type: integer
conditions:
type: array
items:
type: object
scope: Namespaced
names:
plural: vektagrafclusters
singular: vektagrafcluster
kind: VektagrafCluster
---
apiVersion: vektagraf.io/v1
kind: VektagrafCluster
metadata:
name: production-cluster
namespace: vektagraf-production
spec:
replicas: 3
replication:
strategy: "master-slave"
syncMode: "async"
resources:
cpu: "2000m"
memory: "8Gi"
storage: "100Gi"
Disaster Recovery and Backup Strategies
Automated Backup System
// lib/backup/backup_manager.dart
class BackupManager {
final VektagrafDatabase database;
final BackupStorage storage;
final BackupConfig config;
BackupManager({
required this.database,
required this.storage,
required this.config,
});
Future<void> startScheduledBackups() async {
Timer.periodic(config.interval, (_) => _performBackup());
}
Future<BackupResult> _performBackup() async {
final backupId = _generateBackupId();
final startTime = DateTime.now();
try {
// Create consistent snapshot
final snapshot = await database.createSnapshot();
// Compress and encrypt backup
final compressedData = await _compressSnapshot(snapshot);
final encryptedData = await _encryptBackup(compressedData);
// Upload to storage
await storage.upload(backupId, encryptedData);
// Verify backup integrity
await _verifyBackup(backupId);
final duration = DateTime.now().difference(startTime);
// Clean up old backups
await _cleanupOldBackups();
return BackupResult.success(
backupId: backupId,
size: encryptedData.length,
duration: duration,
);
} catch (e) {
return BackupResult.failure(
backupId: backupId,
error: e.toString(),
);
}
}
Future<void> restore(String backupId) async {
// Download backup
final encryptedData = await storage.download(backupId);
// Decrypt and decompress
final compressedData = await _decryptBackup(encryptedData);
final snapshot = await _decompressSnapshot(compressedData);
// Restore database
await database.restoreFromSnapshot(snapshot);
}
String _generateBackupId() {
final timestamp = DateTime.now().toIso8601String();
return 'backup-$timestamp';
}
Future<void> _cleanupOldBackups() async {
final backups = await storage.listBackups();
final cutoffDate = DateTime.now().subtract(config.retentionPeriod);
for (final backup in backups) {
if (backup.createdAt.isBefore(cutoffDate)) {
await storage.delete(backup.id);
}
}
}
}
class BackupConfig {
final Duration interval;
final Duration retentionPeriod;
final CompressionLevel compression;
final bool encryptionEnabled;
const BackupConfig({
this.interval = const Duration(hours: 6),
this.retentionPeriod = const Duration(days: 30),
this.compression = CompressionLevel.balanced,
this.encryptionEnabled = true,
});
}
Cross-Region Backup Replication
# backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: vektagraf-backup
namespace: vektagraf-production
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: vektagraf/backup-tool:latest
env:
- name: SOURCE_ENDPOINT
value: "vektagraf-cluster:9090"
- name: S3_BUCKET
value: "vektagraf-backups"
- name: AWS_REGION
value: "us-east-1"
- name: BACKUP_ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: backup-secrets
key: encryption-key
command:
- /bin/sh
- -c
- |
# Create backup
vektagraf-backup create \
--source $SOURCE_ENDPOINT \
--output /tmp/backup.vbk \
--compress \
--encrypt
# Upload to primary region
aws s3 cp /tmp/backup.vbk \
s3://$S3_BUCKET/$(date +%Y/%m/%d)/backup-$(date +%H%M%S).vbk
# Replicate to secondary region
aws s3 cp /tmp/backup.vbk \
s3://$S3_BUCKET-replica/$(date +%Y/%m/%d)/backup-$(date +%H%M%S).vbk \
--region us-west-2
# Cleanup local file
rm /tmp/backup.vbk
restartPolicy: OnFailure
serviceAccountName: backup-service-account
Point-in-Time Recovery
// lib/recovery/point_in_time_recovery.dart
class PointInTimeRecovery {
final VektagrafDatabase database;
final WALManager walManager;
PointInTimeRecovery({
required this.database,
required this.walManager,
});
Future<void> recoverToPoint(DateTime targetTime) async {
// Find the latest backup before target time
final backup = await _findLatestBackupBefore(targetTime);
if (backup == null) {
throw Exception('No backup found before target time');
}
// Restore from backup
await database.restoreFromBackup(backup.id);
// Apply WAL entries from backup time to target time
final walEntries = await walManager.getEntriesBetween(
backup.createdAt,
targetTime,
);
for (final entry in walEntries) {
await database.applyWALEntry(entry);
}
}
Future<List<RecoveryPoint>> getAvailableRecoveryPoints() async {
final backups = await _getBackups();
final walEntries = await walManager.getAllEntries();
final recoveryPoints = <RecoveryPoint>[];
// Add backup points
for (final backup in backups) {
recoveryPoints.add(RecoveryPoint(
timestamp: backup.createdAt,
type: RecoveryPointType.backup,
id: backup.id,
));
}
// Add WAL points
for (final entry in walEntries) {
recoveryPoints.add(RecoveryPoint(
timestamp: entry.timestamp,
type: RecoveryPointType.wal,
id: entry.id,
));
}
recoveryPoints.sort((a, b) => a.timestamp.compareTo(b.timestamp));
return recoveryPoints;
}
}
class RecoveryPoint {
final DateTime timestamp;
final RecoveryPointType type;
final String id;
const RecoveryPoint({
required this.timestamp,
required this.type,
required this.id,
});
}
enum RecoveryPointType { backup, wal }
Best Practices
Scaling Best Practices
-
Gradual Scaling
- Implement gradual scaling policies
- Monitor performance during scaling events
- Use predictive scaling when possible
- Test scaling scenarios regularly
-
Resource Planning
- Monitor resource utilization trends
- Plan for peak load scenarios
- Implement cost optimization strategies
- Use spot instances for non-critical workloads
-
Performance Monitoring
- Set up comprehensive monitoring
- Use custom metrics for scaling decisions
- Implement alerting for scaling events
- Regular performance testing
High Availability Best Practices
-
Redundancy
- Deploy across multiple availability zones
- Implement database replication
- Use multiple load balancers
- Plan for component failures
-
Health Checks
- Implement comprehensive health checks
- Use different types of probes appropriately
- Monitor health check performance
- Implement graceful degradation
-
Disaster Recovery
- Regular backup testing
- Document recovery procedures
- Practice disaster recovery scenarios
- Implement cross-region replication
Advanced Topics
Multi-Region Active-Active Setup
# multi-region-setup.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: multi-region-config
data:
regions.json: |
{
"regions": [
{
"name": "us-east-1",
"endpoint": "https://api-east.yourdomain.com",
"weight": 50,
"primary": true
},
{
"name": "us-west-2",
"endpoint": "https://api-west.yourdomain.com",
"weight": 30,
"primary": false
},
{
"name": "eu-west-1",
"endpoint": "https://api-eu.yourdomain.com",
"weight": 20,
"primary": false
}
],
"replication": {
"mode": "async",
"conflictResolution": "lastWriteWins"
}
}
Chaos Engineering
// lib/chaos/chaos_monkey.dart
class ChaosMonkey {
final List<ChaosExperiment> experiments;
final ChaosConfig config;
ChaosMonkey({
required this.experiments,
required this.config,
});
Future<void> start() async {
Timer.periodic(config.interval, (_) => _runRandomExperiment());
}
Future<void> _runRandomExperiment() async {
if (!config.enabled || Random().nextDouble() > config.probability) {
return;
}
final experiment = experiments[Random().nextInt(experiments.length)];
try {
print('Running chaos experiment: ${experiment.name}');
await experiment.execute();
// Monitor system behavior
await _monitorSystemHealth(experiment.duration);
// Cleanup
await experiment.cleanup();
} catch (e) {
print('Chaos experiment failed: $e');
}
}
Future<void> _monitorSystemHealth(Duration duration) async {
final endTime = DateTime.now().add(duration);
while (DateTime.now().isBefore(endTime)) {
// Check system metrics
final metrics = await _collectMetrics();
if (metrics.errorRate > 0.1) {
throw Exception('High error rate detected during chaos experiment');
}
await Future.delayed(Duration(seconds: 10));
}
}
}
abstract class ChaosExperiment {
String get name;
Duration get duration;
Future<void> execute();
Future<void> cleanup();
}
class PodKillerExperiment implements ChaosExperiment {
@override
String get name => 'Pod Killer';
@override
Duration get duration => Duration(minutes: 5);
@override
Future<void> execute() async {
// Kill random pod
final pods = await _getVektagrafPods();
final targetPod = pods[Random().nextInt(pods.length)];
await _killPod(targetPod);
}
@override
Future<void> cleanup() async {
// Kubernetes will automatically restart the pod
}
}
Summary
This chapter covered comprehensive scaling and high availability strategies for Vektagraf applications, including:
- Scaling Strategies: Horizontal and vertical scaling approaches
- Load Balancing: Client-side and server-side load balancing
- Clustering: Master-slave and multi-master replication
- Disaster Recovery: Backup strategies and point-in-time recovery
- Best Practices: Operational excellence and reliability patterns
Key Takeaways
- Choose scaling strategies based on workload characteristics
- Implement comprehensive health checking and monitoring
- Design for failure with proper redundancy and failover
- Regular testing of disaster recovery procedures
- Use automation for scaling and recovery operations
Next Steps
- Chapter 18: Learn about DevOps and CI/CD integration
- Chapter 19: Master troubleshooting and maintenance procedures