← Tools

LLM Cache

Tags

ConvexLLMCachingTypeScript

Convex component that caches LLM API request/response pairs with tiered TTL, time travel, and built-in observability.

Stop paying for duplicate LLM calls. A Convex component that caches request/response pairs—same prompt always hits the same cache entry. Instant responses for identical prompts.

#Features

  • Deterministic cache keys — SHA-256 hash of normalized params. Same prompt = same entry.
  • Tiered TTL — Entries start at 24h, promote to 7 days on first hit, pin permanently with pin: true
  • Time travel — Old responses archived when a cached request gets a new one. Query full history.
  • Request normalization — Sorts keys, trims whitespace, lowercases model names, rounds floats
  • Flexible invalidation — Delete by cacheKey, model, modelVersion, tag, or time range
  • OpenAI-compatible — Works with any format that has messages, model, and optional params

#Demo

prompt-workbench-tau.vercel.app — Prompt Workbench demo app built with this component. Try prompts, see cache hit/miss, explore time travel.

#Installation

npm install @mzedstudio/llm-cache convex

#Setup

#1. Register the component

ts
// convex/convex.config.ts
import { defineApp } from "convex/server";
import llmCache from "@mzedstudio/llm-cache/convex.config.js";
 
const app = defineApp();
app.use(llmCache, { name: "llmCache" });
export default app;

#2. Initialize the client

ts
// convex/llm.ts
import { LLMCache } from "@mzedstudio/llm-cache";
import { components } from "./_generated/api";
 
const cache = new LLMCache(components.llmCache);

#Usage

#Cache an LLM call

ts
import { action } from "./_generated/server";
 
export const chat = action({
  args: { messages: v.array(v.object({ role: v.string(), content: v.string() })) },
  handler: async (ctx, args) => {
    const request = { messages: args.messages, model: "gpt-4o", temperature: 0.7 };
 
    const cached = await cache.lookup(ctx, { request });
    if (cached) return cached.response;
 
    const response = await yourLLMProvider.create(request); // e.g. openai.chat.completions.create
    await cache.store(ctx, { request, response, tags: ["chat"] });
    return response;
  },
});

#Pin important responses

Pinned entries never expire:

ts
await cache.store(ctx, {
  request,
  response,
  pin: true,
  tags: ["system-prompt"],
});

#Read-only peek (no hit counting)

Use peek from queries:

ts
const cached = await cache.peek(ctx, { request: args.request });

#Filter and query

ts
const entries = await cache.query(ctx, {
  model: "gpt-4o",
  after: Date.now() - 3600000,
});
 
const byTag = await cache.query(ctx, { tag: "summarize", limit: 20 });

#Time travel

ts
const timeline = await cache.history(ctx, { request });
// [{ response, storedAt, isCurrent: false }, ..., { response, storedAt, isCurrent: true }]

#Invalidate and cleanup

ts
await cache.invalidate(ctx, { model: "gpt-4o" });
await cache.invalidate(ctx, { modelVersion: "gpt-4o-2024-05-13" });
 
const result = await cache.cleanup(ctx, { batchSize: 200 });
// { deletedCount, keys, hasMore }

#Stats

ts
const stats = await cache.getStats(ctx);
// { totalEntries, totalHits, entriesByModel, hitsByModel, oldestEntry, newestEntry }

#TTL Tiers

TierDurationTrigger
024hFirst stored
17 daysFirst lookup hit
2Neverstore with pin: true

On each lookup hit: Tier 0 promotes to 1; Tier 1 refreshes expiry.

#Config

ts
await cache.setConfig(ctx, {
  config: {
    defaultTtlMs: 12 * 60 * 60 * 1000,
    promotionTtlMs: 14 * 24 * 60 * 60 * 1000,
    ttlByModel: { "gpt-4o-mini": 3600000, "gpt-4o": 172800000 },
    ttlByTag: { embedding: 30 * 24 * 60 * 60 * 1000 },
    normalizeRequests: true,
  },
});

TTL priority: tag > model > default.

#API Summary

MethodContextNotes
lookupmutation/actionHit count + TTL promote
peekanyRead-only, no side effects
storemutation/actionArchives old response if different
queryanyFilter by model, tag, time
historyanyFull response timeline
invalidatemutation/actionDelete by key/model/tag/time
cleanupactionRemove expired entries
getStatsanyHit counts, storage metrics
View on GitHub·raymond-UI/llm-cache
GitHub