Skip to content

Cache Multi-Shard Read Analysis & Design

Executive Summary

This document analyzes which cache types need multi-shard read support, how they should be sharded, and designs a central multi-shard read system with timestamp-based indexing for pagination.


1. Cache Type Analysis

1.1 Internal Logs Cache ✅ NEEDS MULTI-SHARD

Current State:

  • ✅ Already uses monthly sharding (monthly: true)
  • ✅ Already implements multi-shard reads via getRegistryShards()
  • ✅ Has timestamp field in entries
  • ❌ Missing: Index key per shard for fast key listing (listLogKeys() is stub)

Sharding Strategy: Month-based (already implemented)

  • Registry: registry:cache:internal:2025-02
  • Shard: cache:internal:shard:${random16}
  • Key: internal:log:${uuid}

Why Multi-Shard:

  • Logs accumulate over time (monthly rotation)
  • Admin queries need to read across months
  • Each month can have multiple shards if volume grows

Indexing Needs:

  • Index key per shard: internal:index:${shardId} → Array of {id, timestamp} sorted by timestamp DESC
  • Purpose: Fast listing without storage.list() (DO doesn't expose to Workers)
  • Update: Append to index on each log write

Pagination:

  • Current: limit only (no cursor)
  • Needed: Timestamp-based cursor (cursor: ${timestamp}:${id})
  • Query: Filter by fromTimestamp/toTimestamp, sort DESC, apply cursor

1.2 Maintenance Cache ⚠️ MAYBE MULTI-SHARD

Current State:

  • ✅ Single shard (active shard only)
  • ✅ Keys: maintenance:${requestId}
  • ✅ Data has createdAt, updatedAt, dueDate timestamps
  • ❌ Cache is only for single-request lookups (not list queries)
  • ❌ List queries go to DB, not cache

Sharding Strategy: ID-based (via registry, not hash)

  • Registry manages shards automatically when size limit reached
  • No monthly sharding needed (requests don't accumulate like logs)

Why Multi-Shard (if implemented):

  • If we want to cache list query results (e.g., "all requests for user X")
  • If maintenance cache grows beyond one shard limit
  • Currently: Single-request cache hits don't need multi-shard

Indexing Needs:

  • Only if caching list queries: Index by userId, departmentId, status, etc.
  • Current: No indexing needed (single-key lookups only)

Pagination:

  • Current: DB handles pagination (offset/limit)
  • Cache: No pagination (single requests only)

Recommendation: Keep single-shard for now. Add multi-shard only if:

  1. We start caching list query results
  2. Single shard hits size limit

1.3 Auth Rate Limit Cache ❌ NO MULTI-SHARD

Current State:

  • ✅ Single shard
  • ✅ Keys: auth:attempts:${identifier} (email/userId)
  • ✅ Data: {count: number, lastAttempt: number}
  • ✅ Per-identifier lookups only

Sharding Strategy: N/A (single shard sufficient)

Why No Multi-Shard:

  • Per-identifier lookups (no list queries)
  • TTL-based expiration (5min success, 1hr failure)
  • No historical queries needed

Indexing Needs: None

Pagination: None needed


1.4 Analytics Cache ⚠️ MAYBE MULTI-SHARD

Current State:

  • ✅ Single shard
  • ✅ Keys: Query-based (analytics:${fromDate}-${toDate})
  • ✅ Caches aggregated analytics results

Sharding Strategy: Month-based (if implemented)

  • Could shard by month for historical analytics
  • Current: Single shard caches query results

Why Multi-Shard (if implemented):

  • If we want to aggregate analytics across months
  • If we store raw analytics events (not just aggregated results)

Indexing Needs:

  • Only if storing raw events: Timestamp-based index
  • Current: Query-based caching (no indexing needed)

Pagination:

  • Current: Query-based (no pagination)
  • If raw events: Timestamp-based cursor

Recommendation: Keep single-shard for now. Add monthly sharding only if we store raw analytics events.


2. Sharding Strategy Summary

Cache TypeMulti-Shard?StrategyReason
Internal Logs✅ YESMonth-basedLogs accumulate, admin queries across months
Maintenance⚠️ MAYBEID-based (registry)Only if caching list queries or size limit
Auth Rate Limits❌ NOSingle shardPer-identifier only, TTL-based
Analytics⚠️ MAYBEMonth-basedOnly if storing raw events

3. Timestamp-Based Indexing Analysis

3.1 Internal Logs ✅ NEEDS INDEXING

Current: listLogKeys() returns [] (stub)

Required:

  • Index key per shard: internal:index:${shardId}
  • Structure: Array<{id: string, timestamp: number}> sorted DESC
  • Update: Append on each log write
  • Read: Use index to get log IDs, then fetch entries

Implementation:

typescript
// On write:
const indexKey = `internal:index:&#36;{shardId}`;
const index = await cache.get(shardId, indexKey) || [];
index.push({id: logEntry.id, timestamp: logEntry.timestamp});
index.sort((a, b) => b.timestamp - a.timestamp); // DESC
await cache.set(shardId, indexKey, index, 0);

// On read:
const index = await cache.get(shardId, indexKey) || [];
const logIds = index.map(e => e.id);
// Fetch entries by IDs, filter by timestamp range

Pagination:

  • Cursor format: cursor:&#36;{timestamp}:&#36;{id}
  • Query: Filter index by fromTimestamp/toTimestamp, apply cursor, limit

3.2 Maintenance Cache ❌ NO INDEXING NEEDED

Reason: Cache is for single-request lookups only. List queries go to DB.

If we add list caching later:

  • Index by userId, departmentId, status, etc.
  • Timestamp-based index for date range queries

3.3 Auth Rate Limits ❌ NO INDEXING NEEDED

Reason: Per-identifier lookups only, no list queries.


3.4 Analytics Cache ❌ NO INDEXING NEEDED (CURRENT)

Reason: Query-based caching (aggregated results), not raw events.

If we add raw event storage:

  • Timestamp-based index per month shard
  • Similar to internal logs

4. Central Multi-Shard Read Function Design

4.1 Requirements

  1. Central function (like registry does central registry stuff)
  2. Handles multi-shard reads for caches that need it
  3. Timestamp-based filtering (fromTimestamp/toTimestamp)
  4. Cursor-based pagination (timestamp:id cursor)
  5. Filtering (level, category, service, etc.)
  6. Sorting (timestamp DESC default)

4.2 Design: MultiShardReader Class

Location: lib/services/cache/multi-shard-reader/index.ts

Responsibilities:

  • Get shards from registry (getRegistryShards())
  • Read from each shard (using index keys)
  • Filter by timestamp range
  • Apply cursor pagination
  • Merge and sort results
  • Return paginated results with next cursor

API:

typescript
interface MultiShardReadOptions<T> {
  cacheType: string;
  monthly?: boolean;
  indexKeyPrefix: string; // e.g., "internal:index"
  entryKeyPrefix: string; // e.g., "internal:log"
  filters?: {
    fromTimestamp?: number;
    toTimestamp?: number;
    // ... other filters
  };
  cursor?: string; // "timestamp:id"
  limit?: number;
  transform?: (entry: unknown) => T; // Transform cache entry to T
}

interface MultiShardReadResult<T> {
  entries: T[];
  nextCursor?: string;
  hasMore: boolean;
  shardsRead: number;
}

class MultiShardReader {
  async read<T>(options: MultiShardReadOptions<T>): Promise<MultiShardReadResult<T>>;
}

5. Implementation Plan

Phase 1: Index Key Support for Internal Logs

  1. ✅ Update InternalCacheService.log() to update index key
  2. ✅ Implement listLogKeys() using index key
  3. ✅ Update getLogsFromShard() to use index

Phase 2: Central Multi-Shard Reader

  1. ✅ Create MultiShardReader class
  2. ✅ Implement timestamp-based filtering
  3. ✅ Implement cursor pagination
  4. ✅ Update InternalCacheService.getLogs() to use MultiShardReader

Phase 3: Future (if needed)

  1. Add multi-shard support for maintenance (if caching list queries)
  2. Add monthly sharding for analytics (if storing raw events)

6. Recommendations

DO NOW:

  1. Internal logs indexing: Implement index key per shard
  2. Central multi-shard reader: Create reusable function
  3. Cursor pagination: Add to internal logs

⚠️ DEFER:

  1. Maintenance multi-shard: Only if caching list queries
  2. Analytics monthly sharding: Only if storing raw events

DON'T:

  1. Auth rate limits multi-shard: Not needed (per-identifier only)

7. Key Decisions

  1. Index keys stored in same shard (not separate index shard) - simpler, co-located
  2. Cursor format: timestamp:id (not just timestamp) - handles duplicates
  3. Central reader (not per-cache) - DRY, consistent behavior
  4. Timestamp-based only (not ID-based) - aligns with time-series data