LLM Cache

Repository

#Features

Deterministic cache keys — SHA-256 hash of normalized params. Same prompt = same entry.
Tiered TTL — Entries start at 24h, promote to 7 days on first hit, pin permanently with pin: true
Time travel — Old responses archived when a cached request gets a new one. Query full history.
Request normalization — Sorts keys, trims whitespace, lowercases model names, rounds floats
Flexible invalidation — Delete by cacheKey, model, modelVersion, tag, or time range
OpenAI-compatible — Works with any format that has messages, model, and optional params

#Demo

prompt-workbench-tau.vercel.app — Prompt Workbench demo app built with this component. Try prompts, see cache hit/miss, explore time travel.

#Installation

npm install @mzedstudio/llm-cache convex

#Setup

#1. Register the component

// convex/convex.config.ts
import { defineApp } from "convex/server";
import llmCache from "@mzedstudio/llm-cache/convex.config.js";
 
const app = defineApp();
app.use(llmCache, { name: "llmCache" });
export default app;

#2. Initialize the client

// convex/llm.ts
import { LLMCache } from "@mzedstudio/llm-cache";
import { components } from "./_generated/api";
 
const cache = new LLMCache(components.llmCache);

#Usage

#Cache an LLM call

import { action } from "./_generated/server";
 
export const chat = action({
  args: { messages: v.array(v.object({ role: v.string(), content: v.string() })) },
  handler: async (ctx, args) => {
    const request = { messages: args.messages, model: "gpt-4o", temperature: 0.7 };
 
    const cached = await cache.lookup(ctx, { request });
    if (cached) return cached.response;
 
    const response = await yourLLMProvider.create(request); // e.g. openai.chat.completions.create
    await cache.store(ctx, { request, response, tags: ["chat"] });
    return response;
  },
});

#Pin important responses

Pinned entries never expire:

await cache.store(ctx, {
  request,
  response,
  pin: true,
  tags: ["system-prompt"],
});

#Read-only peek (no hit counting)

Use peek from queries:

const cached = await cache.peek(ctx, { request: args.request });

#Filter and query

const entries = await cache.query(ctx, {
  model: "gpt-4o",
  after: Date.now() - 3600000,
});
 
const byTag = await cache.query(ctx, { tag: "summarize", limit: 20 });

#Time travel

const timeline = await cache.history(ctx, { request });
// [{ response, storedAt, isCurrent: false }, ..., { response, storedAt, isCurrent: true }]

#Invalidate and cleanup

await cache.invalidate(ctx, { model: "gpt-4o" });
await cache.invalidate(ctx, { modelVersion: "gpt-4o-2024-05-13" });
 
const result = await cache.cleanup(ctx, { batchSize: 200 });
// { deletedCount, keys, hasMore }

#Stats

const stats = await cache.getStats(ctx);
// { totalEntries, totalHits, entriesByModel, hitsByModel, oldestEntry, newestEntry }

#TTL Tiers

Tier	Duration	Trigger
0	24h	First stored
1	7 days	First `lookup` hit
2	Never	`store` with `pin: true`

On each lookup hit: Tier 0 promotes to 1; Tier 1 refreshes expiry.

#Config

await cache.setConfig(ctx, {
  config: {
    defaultTtlMs: 12 * 60 * 60 * 1000,
    promotionTtlMs: 14 * 24 * 60 * 60 * 1000,
    ttlByModel: { "gpt-4o-mini": 3600000, "gpt-4o": 172800000 },
    ttlByTag: { embedding: 30 * 24 * 60 * 60 * 1000 },
    normalizeRequests: true,
  },
});

TTL priority: tag > model > default.

#API Summary

Method	Context	Notes
`lookup`	mutation/action	Hit count + TTL promote
`peek`	any	Read-only, no side effects
`store`	mutation/action	Archives old response if different
`query`	any	Filter by model, tag, time
`history`	any	Full response timeline
`invalidate`	mutation/action	Delete by key/model/tag/time
`cleanup`	action	Remove expired entries
`getStats`	any	Hit counts, storage metrics

View on GitHub·raymond-UI/llm-cache