LLM Cache
Convex component that caches LLM API request/response pairs with tiered TTL, time travel, and built-in observability.
Stop paying for duplicate LLM calls. A Convex component that caches request/response pairs—same prompt always hits the same cache entry. Instant responses for identical prompts.
#Features
- Deterministic cache keys — SHA-256 hash of normalized params. Same prompt = same entry.
- Tiered TTL — Entries start at 24h, promote to 7 days on first hit, pin permanently with
pin: true - Time travel — Old responses archived when a cached request gets a new one. Query full history.
- Request normalization — Sorts keys, trims whitespace, lowercases model names, rounds floats
- Flexible invalidation — Delete by
cacheKey,model,modelVersion,tag, or time range - OpenAI-compatible — Works with any format that has
messages,model, and optional params
#Demo
prompt-workbench-tau.vercel.app — Prompt Workbench demo app built with this component. Try prompts, see cache hit/miss, explore time travel.
#Installation
npm install @mzedstudio/llm-cache convex#Setup
#1. Register the component
ts
// convex/convex.config.ts
import { defineApp } from "convex/server";
import llmCache from "@mzedstudio/llm-cache/convex.config.js";
const app = defineApp();
app.use(llmCache, { name: "llmCache" });
export default app;#2. Initialize the client
ts
// convex/llm.ts
import { LLMCache } from "@mzedstudio/llm-cache";
import { components } from "./_generated/api";
const cache = new LLMCache(components.llmCache);#Usage
#Cache an LLM call
ts
import { action } from "./_generated/server";
export const chat = action({
args: { messages: v.array(v.object({ role: v.string(), content: v.string() })) },
handler: async (ctx, args) => {
const request = { messages: args.messages, model: "gpt-4o", temperature: 0.7 };
const cached = await cache.lookup(ctx, { request });
if (cached) return cached.response;
const response = await yourLLMProvider.create(request); // e.g. openai.chat.completions.create
await cache.store(ctx, { request, response, tags: ["chat"] });
return response;
},
});#Pin important responses
Pinned entries never expire:
ts
await cache.store(ctx, {
request,
response,
pin: true,
tags: ["system-prompt"],
});#Read-only peek (no hit counting)
Use peek from queries:
ts
const cached = await cache.peek(ctx, { request: args.request });#Filter and query
ts
const entries = await cache.query(ctx, {
model: "gpt-4o",
after: Date.now() - 3600000,
});
const byTag = await cache.query(ctx, { tag: "summarize", limit: 20 });#Time travel
ts
const timeline = await cache.history(ctx, { request });
// [{ response, storedAt, isCurrent: false }, ..., { response, storedAt, isCurrent: true }]#Invalidate and cleanup
ts
await cache.invalidate(ctx, { model: "gpt-4o" });
await cache.invalidate(ctx, { modelVersion: "gpt-4o-2024-05-13" });
const result = await cache.cleanup(ctx, { batchSize: 200 });
// { deletedCount, keys, hasMore }#Stats
ts
const stats = await cache.getStats(ctx);
// { totalEntries, totalHits, entriesByModel, hitsByModel, oldestEntry, newestEntry }#TTL Tiers
| Tier | Duration | Trigger |
|---|---|---|
| 0 | 24h | First stored |
| 1 | 7 days | First lookup hit |
| 2 | Never | store with pin: true |
On each lookup hit: Tier 0 promotes to 1; Tier 1 refreshes expiry.
#Config
ts
await cache.setConfig(ctx, {
config: {
defaultTtlMs: 12 * 60 * 60 * 1000,
promotionTtlMs: 14 * 24 * 60 * 60 * 1000,
ttlByModel: { "gpt-4o-mini": 3600000, "gpt-4o": 172800000 },
ttlByTag: { embedding: 30 * 24 * 60 * 60 * 1000 },
normalizeRequests: true,
},
});TTL priority: tag > model > default.
#API Summary
| Method | Context | Notes |
|---|---|---|
lookup | mutation/action | Hit count + TTL promote |
peek | any | Read-only, no side effects |
store | mutation/action | Archives old response if different |
query | any | Filter by model, tag, time |
history | any | Full response timeline |
invalidate | mutation/action | Delete by key/model/tag/time |
cleanup | action | Remove expired entries |
getStats | any | Hit counts, storage metrics |