Architecture & Transparency
A direct description of how CommonBench is built. Not a marketing page. If something here is unclear or you want a deeper view of a specific component, email cases@commonbench.ai.
What the system is
CommonBench is a server-rendered chat application that runs your legal question through a structured prompt pipeline against a frontier large language model, augmented with a curated database of authorities across five common law jurisdictions. The output is post-processed for citation verification, structural quality, and disclosure compliance before it reaches you.
The corpus
- ~2,000 hand-curated and scraper-augmented case authorities across UK, US, HK, SG, AU.
- Each row carries:
id,name,citation,year,court,jurisdiction,topic,principle, optionalratio,summary_one_line,key_paragraphs,doctrinal_status,leading_case,cited_by_count,last_verified. - SQLite-backed with FTS5 full-text search; doctrinal columns populated by a separate tagging pipeline that uses the model to enrich each row with structured metadata.
- The "Corpus as of" line beneath every chat response is the maximum
last_verifiedacross the table — a real signal of when the corpus was last refreshed.
The retrieval
- Each user message is classified into a complexity level (L1 / L2 / L3) by a server-side heuristic that looks at length, multi-issue indicators, and document attachment.
- L1 queries get a focused doctrinal lookup; L2 / L3 queries trigger a richer retrieval pass that pulls authorities from the FTS index and re-ranks for jurisdiction match and doctrinal weight.
- The selected authorities are inlined into the model's system prompt as structured context. The model is instructed never to invent citations.
The model
- Anthropic Claude is the underlying frontier model (model id is visible in the per-response "Source" panel inside the chat).
- Prompts are tier- and jurisdiction-specific; the static portion is cached at the provider layer to reduce cost on repeat queries.
- L3 queries get extended thinking budget so the model can reason through multi-issue analysis before composing the response.
Citation verification
- After the model finishes streaming, every recognised case citation in the response is matched against our database and against external sources (BAILII, CourtListener, AustLII, HKLII, vLex where applicable).
- Citations that match cleanly get a green tick. Citations that do not match are flagged as unverified. Citations that pattern-match a fabricated form are stripped before the response is committed to the page.
- The verification result is streamed back as a separate frame and the badges are applied in-place.
What we store
- Saved chats: stored in your browser's localStorage, not on our servers. Closing your browser cache erases them.
- Account and subscription state (email, Stripe IDs, usage counters): stored server-side in encrypted-at-rest data files.
- Anonymised telemetry: latency, error rates, feedback submissions, error reports. No prompt content in telemetry.
- Document uploads for review (Advocate / Chambers): processed in memory, not persisted after extraction.
What we don't do
- We don't train a model on your conversations.
- We don't sell or share your queries.
- We don't claim to provide legal advice, and we don't pretend that machine output replaces a qualified practitioner.
- We don't quietly disable the disclosure that the output is machine-generated.
Open questions
The corpus is a partial coverage of each jurisdiction. The retrieval pass is good but not perfect. The model is fast but can still generate analysis that misses a recent first-instance decision. The error-report button on every response feeds a moderation queue that we read. Tell us when we're wrong.
For questions about this page, email cases@commonbench.ai. For data protection and privacy, see the Privacy Policy. For commercial terms, see the Terms of Service.