Skip to content

Logs Monthly Sharding Spec (DO) — Registry + Unlimited Shards

This spec describes a scalable Durable Objects (DO) design for usage/auth/system logs where:

  • Logs are stored per-key (timestamp/indexed) with pagination.
  • Logs are sharded by month.
  • Each month has a main registry instance that holds only metadata and the active shard pointer.
  • Shards are created dynamically with random IDs to scale without limit.

This is intended to be the concrete “sharding rule for logs.”


1. Definitions

Main instance (registry) per month

Name format:

<mainInstanceName>:<YYYY-MM>

Examples:

  • usage-registry:2026-02
  • auth-registry:2026-02

Responsibility:

  • Keep minimal metadata only:
    • shard list + per-shard metadata
    • active shard id
    • last rotation time
    • retention configuration (optional)

Shard instance (data) per month

Name format:

<mainInstanceName>:<YYYY-MM>:shard:<random-16-chars>

Example:

  • usage-registry:2026-02:shard:9f1c0a7b2d1e4c8a

Responsibility:

  • Store actual log entries using per-key timestamp indexing.
  • Optionally store user-index keys to support “delete by user” without full scans.

2. Storage Key Design (Inside Shard)

Primary log entry key (recommended)

usage:log:<timestampMs>:<id>

Why: Lexicographic order == chronological order, enabling efficient storage.list({ start, end, limit }).

usage:user:<userId>:<timestampMs>:<id> → value: primaryKey

Why: Without this, deleting “all logs for user X” requires scanning every log in the shard/month (too expensive).

Payload shape (example)

  • Value: JSON object { id, timestamp, type, endpoint, duration, userId?, ... }
  • Keep payload small (avoid large bodies in logs).

3. Registry State Model (Inside Main Monthly Instance)

The registry instance stores only metadata.

Required fields

  • activeShardId: string (the shard DO name)
  • shards: array of shard metadata objects

Each shard entry:

  • shardId: string (DO name, e.g. usage-registry:2026-02:shard:...)
  • createdAt: number (ms)
  • lastWriteAt: number (ms)
  • minTimestampMs: number
  • maxTimestampMs: number
  • approxCount: number (increment on write; decrement on delete if possible)
  • approxBytes: number (optional; estimate via JSON length on writes)
  • status: active | sealed (sealed means no new writes)

Note: DO storage does not provide exact size cheaply; track approximations.


4. Write Path (Append Log)

Step A: Resolve registry for month

Compute YYYY-MM from log timestamp (or Date.now()).

Worker calls registry DO:

  • DO(<mainInstanceName>:<YYYY-MM>)

Step B: Registry returns active shard

Registry logic:

  • If no shard exists → create one and set as active.
  • If active shard exceeds threshold → create new shard, seal old, switch active pointer.

Threshold options (pick one):

  • approxCount >= N (e.g. 50k logs)
  • approxBytes >= X (e.g. 50–200 MB)
  • maxTimestampMs - minTimestampMs >= window (time-window per shard)

Step C: Worker writes to shard

Worker calls shard DO:

  • DO(activeShardId)

Shard stores:

  • primary key usage:log:<ts>:<id> → log object
  • (optional) user index key usage:user:<userId>:<ts>:<id> → primaryKey

Registry updates shard metadata:

  • increment approxCount
  • update lastWriteAt, maxTimestampMs, etc.

5. Read Path (Query Logs with Pagination)

  • fromTimestampMs, toTimestampMs
  • limit (500–1000 max)
  • cursor (or startKey)

Step A: Resolve month registries involved

  • If range is within one month → query one registry.
  • If range spans months → query each month registry, then merge results.

Step B: Registry returns shards overlapping time range

Registry filters shards by overlap:

  • shard.maxTimestampMs >= from && shard.minTimestampMs <= to

Returns shard ids ordered by time.

Step C: Fetch pages from shard(s)

Inside shard:

  • use storage.list over usage:log: keys with:
    • start = usage:log:<from>
    • end = usage:log:<to>:\uffff
    • limit = 500..1000

If multiple shards are involved:

  • fetch from newest shards first for “recent logs” UX, or
  • fetch from all shards and merge-sort by timestamp.

Pragmatic recommendation: return newest-first and keep per-request work bounded (don’t fan out to 20 shards in one API call).


6. Delete Path (By Time Range / By User)

Delete by time range

  • Registry returns shards overlapping the range.
  • Worker calls shard delete operation in batches.
  • Shard deletes via storage.list({ start, end, limit }) + batch delete.

Delete by user (with index keys)

  • Registry returns relevant shards (by time range; or all shards in month).
  • Shard deletes using user index prefix:
    • list usage:user:<userId>: in range
    • delete index keys + primary keys

Helper reference: apps/worker/src/lib/services/cache/usage/delete.ts implements the planned batch deletion algorithm (time range + user index).


7. Why This Counts as “Sharding Rule for Logs”

Yes — this is the sharding rule:

  • Month routing is deterministic by registry name: <mainInstanceName>:<YYYY-MM>.
  • Shard selection within a month is controlled by the registry’s active shard pointer and rotation thresholds.
  • Reads/deletes use registry metadata to target only shards overlapping the time range, keeping work bounded.

8. Security and Privacy Notes (University Target)

  • Treat IP/location as personal data; keep fields minimal.
  • Prefer coarse location (country/region/city) over precise location.
  • Retention: keep detailed logs for a short period (e.g. 30–90 days), then delete/anonymize.