Skip to content

Cache Multi-Shard Read Implementation Summary

✅ What Was Done

1. Analysis Document Created

File: docs/architecture/cache-multi-shard-analysis.md

Comprehensive analysis covering:

  • Which caches need multi-shard reads (internal logs ✅, maintenance ⚠️, auth ❌, analytics ⚠️)
  • Sharding strategies (month-based vs ID-based)
  • Timestamp-based indexing requirements
  • Pagination/cursor support needs

2. Central Multi-Shard Reader Implemented

File: apps/worker/src/lib/services/cache/multi-shard-reader/index.ts

Features:

  • ✅ Central function (like registry does central registry stuff)
  • ✅ Reads from multiple shards via registry
  • ✅ Timestamp-based filtering (fromTimestamp/toTimestamp)
  • ✅ Cursor-based pagination (cursor: timestamp:id)
  • ✅ Custom filtering (level, category, service, etc.)
  • ✅ Result merging and sorting (timestamp DESC)
  • ✅ Returns paginated results with nextCursor and hasMore

API:

typescript
const reader = createMultiShardReader(env);
const result = await reader.read<InternalLogEntry>({
  cacheType: 'internal',
  monthly: true,
  indexKeyPrefix: 'internal:index',
  entryKeyPrefix: 'internal:log',
  filters: { fromTimestamp: 1234567890, toTimestamp: 1234567999 },
  cursor: '1234567890:log-id-123',
  limit: 100,
  customFilter: (entry) => entry.level === 'error',
});

3. Internal Logs Cache Updated

File: apps/worker/src/lib/services/cache/internal/index.ts

Changes:

  • ✅ Added index key per shard (internal:index:&#36;{shardId})
  • ✅ Index stores Array<{id: string, timestamp: number}> sorted DESC
  • updateIndexKey() updates index on each log write
  • getLogs() now uses MultiShardReader instead of manual shard iteration
  • ✅ Supports timestamp filtering, custom filters (level, category, service), and pagination

Index Key Structure:

typescript
// Key: internal:index:&#36;{shardId}
// Value: Array<{id: string, timestamp: number}>
[
  {id: 'log-uuid-1', timestamp: 1737000000000},
  {id: 'log-uuid-2', timestamp: 1736999999999},
  // ... sorted DESC
]

📊 Cache Type Status

Cache TypeMulti-Shard?Indexing?Pagination?Status
Internal Logs✅ YES✅ YES✅ YESIMPLEMENTED
Maintenance⚠️ MAYBE❌ NO❌ NOSingle-shard (cache for single requests only)
Auth Rate Limits❌ NO❌ NO❌ NOSingle-shard (per-identifier only)
Analytics⚠️ MAYBE❌ NO❌ NOSingle-shard (query-based caching)

🎯 Key Decisions

1. Sharding Strategy

  • Internal logs: Month-based (already implemented)
  • Maintenance: ID-based via registry (only if caching list queries)
  • Auth: Single shard (not needed)
  • Analytics: Month-based (only if storing raw events)

2. Indexing Strategy

  • Index keys stored in same shard (not separate index shard)
  • Format: {cacheType}:index:&#36;{shardId}Array<{id, timestamp}>
  • Update: Append on write, sort DESC by timestamp
  • Purpose: Fast listing without storage.list() (DO doesn't expose to Workers)

3. Pagination Strategy

  • Cursor format: timestamp:id (not just timestamp - handles duplicates)
  • Query: Filter index by timestamp range, apply cursor, limit
  • Result: Returns entries + nextCursor + hasMore

4. Central vs Per-Cache

  • Central reader (like registry) - DRY, consistent behavior
  • Reusable for any cache that needs multi-shard reads
  • Customizable via customFilter and transform functions

🔧 How It Works

Writing Logs (Internal Cache Example)

typescript
// 1. Get active shard from registry
const shard = await registry.getActiveShard('internal', true); // monthly=true

// 2. Store log entry
await cache.set(shard, `internal:log:&#36;{logId}`, logEntry, ttl);

// 3. Update index key
const indexKey = `internal:index:&#36;{shard}`;
const index = await cache.get(shard, indexKey) || [];
index.push({id: logId, timestamp: logEntry.timestamp});
index.sort((a, b) => b.timestamp - a.timestamp); // DESC
await cache.set(shard, indexKey, index, ttl);

Reading Logs (Multi-Shard)

typescript
// 1. Multi-shard reader gets all shards from registry
const shards = await registry.getRegistryShards('internal', {monthly: true});

// 2. For each shard:
//    - Read index key
//    - Filter by timestamp range
//    - Apply cursor
//    - Fetch entries by IDs
//    - Apply custom filters

// 3. Merge all results, sort DESC, apply limit

// 4. Return entries + nextCursor + hasMore

🚀 Next Steps (If Needed)

Maintenance Cache Multi-Shard (Future)

Only if:

  1. We start caching list query results (not just single requests)
  2. Single shard hits size limit

Implementation:

  • Use MultiShardReader with indexKeyPrefix: 'maintenance:index'
  • Index by userId, departmentId, status, etc. (not just timestamp)
  • Or: Cache list query results with TTL (simpler)

Analytics Monthly Sharding (Future)

Only if:

  1. We store raw analytics events (not just aggregated results)

Implementation:

  • Use monthly sharding (monthly: true)
  • Use MultiShardReader for cross-month aggregation
  • Index by timestamp for fast queries

📝 Usage Examples

Internal Logs - Get Recent Errors

typescript
const logs = await internalCache.getLogs({
  level: 'error',
  fromTimestamp: Date.now() - 24 * 60 * 60 * 1000, // Last 24h
  limit: 50,
});

Internal Logs - Paginated Query

typescript
// First page
const result1 = await multiShardReader.read({
  cacheType: 'internal',
  monthly: true,
  indexKeyPrefix: 'internal:index',
  entryKeyPrefix: 'internal:log',
  filters: { fromTimestamp: startTime, toTimestamp: endTime },
  limit: 100,
});

// Next page
const result2 = await multiShardReader.read({
  // ... same options
  cursor: result1.nextCursor, // Continue from cursor
});

✅ Testing Checklist

  • [ ] Internal logs: Write logs, verify index key updates
  • [ ] Internal logs: Read logs with filters (level, category, service)
  • [ ] Internal logs: Read logs with timestamp range
  • [ ] Internal logs: Pagination with cursor
  • [ ] Multi-shard: Read from multiple monthly shards
  • [ ] Multi-shard: Merge and sort results correctly
  • [ ] Edge cases: Empty shards, deleted entries, cursor boundaries

🎉 Summary

Central multi-shard reader implemented (like registry pattern) ✅ Internal logs now use index keys and multi-shard reader ✅ Timestamp-based indexing for fast pagination ✅ Cursor pagination support ✅ Reusable for future caches (maintenance, analytics) if needed

Ready for testing! 🚀