Cache Multi-Shard Read Analysis & Design

Executive Summary

This document analyzes which cache types need multi-shard read support, how they should be sharded, and designs a central multi-shard read system with timestamp-based indexing for pagination.

1. Cache Type Analysis

1.1 Internal Logs Cache ✅ NEEDS MULTI-SHARD

Current State:

✅ Already uses monthly sharding (monthly: true)
✅ Already implements multi-shard reads via getRegistryShards()
✅ Has timestamp field in entries
❌ Missing: Index key per shard for fast key listing (listLogKeys() is stub)

Sharding Strategy: Month-based (already implemented)

Registry: registry:cache:internal:2025-02
Shard: cache:internal:shard:${random16}
Key: internal:log:${uuid}

Why Multi-Shard:

Logs accumulate over time (monthly rotation)
Admin queries need to read across months
Each month can have multiple shards if volume grows

Indexing Needs:

Index key per shard: internal:index:${shardId} → Array of {id, timestamp} sorted by timestamp DESC
Purpose: Fast listing without storage.list() (DO doesn't expose to Workers)
Update: Append to index on each log write

Pagination:

Current: limit only (no cursor)
Needed: Timestamp-based cursor (cursor: ${timestamp}:${id})
Query: Filter by fromTimestamp/toTimestamp, sort DESC, apply cursor

1.2 Maintenance Cache ⚠️ MAYBE MULTI-SHARD

Current State:

✅ Single shard (active shard only)
✅ Keys: maintenance:${requestId}
✅ Data has createdAt, updatedAt, dueDate timestamps
❌ Cache is only for single-request lookups (not list queries)
❌ List queries go to DB, not cache

Sharding Strategy: ID-based (via registry, not hash)

Registry manages shards automatically when size limit reached
No monthly sharding needed (requests don't accumulate like logs)

Why Multi-Shard (if implemented):

If we want to cache list query results (e.g., "all requests for user X")
If maintenance cache grows beyond one shard limit
Currently: Single-request cache hits don't need multi-shard

Indexing Needs:

Only if caching list queries: Index by userId, departmentId, status, etc.
Current: No indexing needed (single-key lookups only)

Pagination:

Current: DB handles pagination (offset/limit)
Cache: No pagination (single requests only)

Recommendation: Keep single-shard for now. Add multi-shard only if:

We start caching list query results
Single shard hits size limit

1.3 Auth Rate Limit Cache ❌ NO MULTI-SHARD

Current State:

✅ Single shard
✅ Keys: auth:attempts:${identifier} (email/userId)
✅ Data: {count: number, lastAttempt: number}
✅ Per-identifier lookups only

Sharding Strategy: N/A (single shard sufficient)

Why No Multi-Shard:

Per-identifier lookups (no list queries)
TTL-based expiration (5min success, 1hr failure)
No historical queries needed

Indexing Needs: None

Pagination: None needed

1.4 Analytics Cache ⚠️ MAYBE MULTI-SHARD

Current State:

✅ Single shard
✅ Keys: Query-based (analytics:${fromDate}-${toDate})
✅ Caches aggregated analytics results

Sharding Strategy: Month-based (if implemented)

Could shard by month for historical analytics
Current: Single shard caches query results

Why Multi-Shard (if implemented):

If we want to aggregate analytics across months
If we store raw analytics events (not just aggregated results)

Indexing Needs:

Only if storing raw events: Timestamp-based index
Current: Query-based caching (no indexing needed)

Pagination:

Current: Query-based (no pagination)
If raw events: Timestamp-based cursor

Recommendation: Keep single-shard for now. Add monthly sharding only if we store raw analytics events.

2. Sharding Strategy Summary

Cache Type	Multi-Shard?	Strategy	Reason
Internal Logs	✅ YES	Month-based	Logs accumulate, admin queries across months
Maintenance	⚠️ MAYBE	ID-based (registry)	Only if caching list queries or size limit
Auth Rate Limits	❌ NO	Single shard	Per-identifier only, TTL-based
Analytics	⚠️ MAYBE	Month-based	Only if storing raw events

3. Timestamp-Based Indexing Analysis

3.1 Internal Logs ✅ NEEDS INDEXING

Current: listLogKeys() returns [] (stub)

Required:

Index key per shard: internal:index:${shardId}
Structure: Array<{id: string, timestamp: number}> sorted DESC
Update: Append on each log write
Read: Use index to get log IDs, then fetch entries

Implementation:

typescript

// On write:
const indexKey = `internal:index:&#36;{shardId}`;
const index = await cache.get(shardId, indexKey) || [];
index.push({id: logEntry.id, timestamp: logEntry.timestamp});
index.sort((a, b) => b.timestamp - a.timestamp); // DESC
await cache.set(shardId, indexKey, index, 0);

// On read:
const index = await cache.get(shardId, indexKey) || [];
const logIds = index.map(e => e.id);
// Fetch entries by IDs, filter by timestamp range

Pagination:

Cursor format: cursor:${timestamp}:${id}
Query: Filter index by fromTimestamp/toTimestamp, apply cursor, limit

3.2 Maintenance Cache ❌ NO INDEXING NEEDED

Reason: Cache is for single-request lookups only. List queries go to DB.

If we add list caching later:

Index by userId, departmentId, status, etc.
Timestamp-based index for date range queries

3.3 Auth Rate Limits ❌ NO INDEXING NEEDED

Reason: Per-identifier lookups only, no list queries.

3.4 Analytics Cache ❌ NO INDEXING NEEDED (CURRENT)

Reason: Query-based caching (aggregated results), not raw events.

If we add raw event storage:

Timestamp-based index per month shard
Similar to internal logs

4. Central Multi-Shard Read Function Design

4.1 Requirements

Central function (like registry does central registry stuff)
Handles multi-shard reads for caches that need it
Timestamp-based filtering (fromTimestamp/toTimestamp)
Cursor-based pagination (timestamp:id cursor)
Filtering (level, category, service, etc.)
Sorting (timestamp DESC default)

4.2 Design: `MultiShardReader` Class

Location: lib/services/cache/multi-shard-reader/index.ts

Responsibilities:

Get shards from registry (getRegistryShards())
Read from each shard (using index keys)
Filter by timestamp range
Apply cursor pagination
Merge and sort results
Return paginated results with next cursor

API:

typescript

interface MultiShardReadOptions<T> {
  cacheType: string;
  monthly?: boolean;
  indexKeyPrefix: string; // e.g., "internal:index"
  entryKeyPrefix: string; // e.g., "internal:log"
  filters?: {
    fromTimestamp?: number;
    toTimestamp?: number;
    // ... other filters
  };
  cursor?: string; // "timestamp:id"
  limit?: number;
  transform?: (entry: unknown) => T; // Transform cache entry to T
}

interface MultiShardReadResult<T> {
  entries: T[];
  nextCursor?: string;
  hasMore: boolean;
  shardsRead: number;
}

class MultiShardReader {
  async read<T>(options: MultiShardReadOptions<T>): Promise<MultiShardReadResult<T>>;
}

5. Implementation Plan

Phase 1: Index Key Support for Internal Logs

✅ Update InternalCacheService.log() to update index key
✅ Implement listLogKeys() using index key
✅ Update getLogsFromShard() to use index

Phase 2: Central Multi-Shard Reader

✅ Create MultiShardReader class
✅ Implement timestamp-based filtering
✅ Implement cursor pagination
✅ Update InternalCacheService.getLogs() to use MultiShardReader

Phase 3: Future (if needed)

Add multi-shard support for maintenance (if caching list queries)
Add monthly sharding for analytics (if storing raw events)

6. Recommendations

✅ DO NOW:

Internal logs indexing: Implement index key per shard
Central multi-shard reader: Create reusable function
Cursor pagination: Add to internal logs

⚠️ DEFER:

Maintenance multi-shard: Only if caching list queries
Analytics monthly sharding: Only if storing raw events

❌ DON'T:

Auth rate limits multi-shard: Not needed (per-identifier only)

7. Key Decisions

Index keys stored in same shard (not separate index shard) - simpler, co-located
Cursor format: timestamp:id (not just timestamp) - handles duplicates
Central reader (not per-cache) - DRY, consistent behavior
Timestamp-based only (not ID-based) - aligns with time-series data

Cache Multi-Shard Read Analysis & Design ​

Executive Summary ​

1. Cache Type Analysis ​

1.1 Internal Logs Cache ✅ NEEDS MULTI-SHARD ​

1.2 Maintenance Cache ⚠️ MAYBE MULTI-SHARD ​

1.3 Auth Rate Limit Cache ❌ NO MULTI-SHARD ​

1.4 Analytics Cache ⚠️ MAYBE MULTI-SHARD ​

2. Sharding Strategy Summary ​

3. Timestamp-Based Indexing Analysis ​

3.1 Internal Logs ✅ NEEDS INDEXING ​

3.2 Maintenance Cache ❌ NO INDEXING NEEDED ​

3.3 Auth Rate Limits ❌ NO INDEXING NEEDED ​

3.4 Analytics Cache ❌ NO INDEXING NEEDED (CURRENT) ​

4. Central Multi-Shard Read Function Design ​

4.1 Requirements ​

4.2 Design: MultiShardReader Class ​

5. Implementation Plan ​

Phase 1: Index Key Support for Internal Logs ​

Phase 2: Central Multi-Shard Reader ​

Phase 3: Future (if needed) ​

6. Recommendations ​

✅ DO NOW: ​

⚠️ DEFER: ​

❌ DON'T: ​

7. Key Decisions ​

Cache Multi-Shard Read Analysis & Design

Executive Summary

1. Cache Type Analysis

1.1 Internal Logs Cache ✅ NEEDS MULTI-SHARD

1.2 Maintenance Cache ⚠️ MAYBE MULTI-SHARD

1.3 Auth Rate Limit Cache ❌ NO MULTI-SHARD

1.4 Analytics Cache ⚠️ MAYBE MULTI-SHARD

2. Sharding Strategy Summary

3. Timestamp-Based Indexing Analysis

3.1 Internal Logs ✅ NEEDS INDEXING

3.2 Maintenance Cache ❌ NO INDEXING NEEDED

3.3 Auth Rate Limits ❌ NO INDEXING NEEDED

3.4 Analytics Cache ❌ NO INDEXING NEEDED (CURRENT)

4. Central Multi-Shard Read Function Design

4.1 Requirements

4.2 Design: `MultiShardReader` Class

5. Implementation Plan

Phase 1: Index Key Support for Internal Logs

Phase 2: Central Multi-Shard Reader

Phase 3: Future (if needed)

6. Recommendations

✅ DO NOW:

⚠️ DEFER:

❌ DON'T:

7. Key Decisions