Lemma, end to end
This is the “start here” page. Read it once and everything else on the site will make sense. It assumes you know nothing about Lemma — we explain every word as we go.
The problem, in everyday terms
Ask an AI model to write a piece of scientific code and it will usually give you something that runs without errors and reads convincingly. The catch: it can still be quietly wrong in a way only an expert would notice — it breaks a law of physics, mixes up units, or loses energy that should have been conserved.
Here’s the trap. A normal test only checks “did the program crash?” — and it didn’t. A spell-checker only checks “is the grammar OK?” — and it is. Neither one knows any physics. So the mistake slips through, and the only thing that catches it is a human who knows the subject. That human is slow, busy, and expensive — they are the bottleneck.
Important: Lemma is not a chatbot and not “another AI.” It’s a helper that other tools call when they need to check scientific work. Think of it like a fact-checker on staff: the writer (the AI model) drafts something, and Lemma checks it against known science before anyone trusts it.
flowchart TB
subgraph Agents["Any AI coding tool you already use"]
A1[Claude Code]
A2[Cursor]
A3[Your own script or notebook]
end
Agents -->|asks Lemma to check the work| L((Lemma))
L --> V[/answer: is it OK, and how sure?/]
classDef core fill:#1e3a5f,stroke:#4a90d9,color:#fff;
class L core;
What Lemma is made of: four parts
Lemma has four parts. You don’t need to memorise them now — just meet them. Every other page zooms into one.
flowchart LR C[1. Cards<br/>the knowledge] --> E[2. Engine<br/>the checker] E --> D[3. Distribution<br/>how tools reach it] D --> P[4. Provenance<br/>the receipt] classDef a fill:#1e3a5f,stroke:#4a90d9,color:#fff; class C,E,D,P a;
- Cards — the knowledge. A card is a small file describing one piece of science (one formula, what its symbols mean, how it should behave). Think of a recipe box, but each card holds a fact-checked scientific principle instead of a recipe. Together the cards are an open, shared library.
- Engine — the checker. A program that reads a card and checks whether some work agrees with it. It’s like a fact-checker who looks up the reference and compares.
- Distribution — how tools reach it. The plumbing that lets your AI tool actually call Lemma (an “MCP server,” a Python library, a command-line tool — all explained on their own pages).
- Provenance — the receipt. Every answer Lemma gives says which card it used, so you can trace and trust it. Like a fact-check that links its sources.
What’s in it today
You can reproduce these numbers by running one example file
(examples/browse_cards.py, explained on the Python page).
| Thing | Today |
|---|---|
| Cards in the shared library | 38 |
| Subject areas covered | 21 (physics, chemistry, biology, climate, maths, numerical methods, engineering) |
| Kinds of card | 4 — explained on the Cards page |
| Tools an AI can call | 5 — explained on the MCP server page |
| Cost to use | Free and open |
One check, from start to finish
Let’s walk through the most important case in slow motion: an AI model invents a new formula, and Lemma checks it.
sequenceDiagram participant M as AI model participant S as Lemma participant K as Cards library M->>S: "here's a formula I came up with — please check it" S->>K: look up the related known science K-->>S: the relevant cards S-->>M: result + which cards were used Note over M: looks bad → throw it away and try again<br/>looks good → worth a human's time
Now a concrete example. Suppose the model proposes the formula for the energy of a moving object. Lemma checks it and replies:
looks correct (1 of 1 checks passed) the units on both sides matchBut suppose the model instead writes a wrong version of that formula. Lemma catches it:
serious problem (0 of 1 checks passed) the units don't match: the formula gives "mass x length / time", but energy should be "mass x length-squared / time-squared"You can run both of these yourself — see Putting it together.
How Lemma rates things: a traffic light
Lemma doesn’t just say “pass” or “fail.” Each check gets a severity — how bad the problem is — like a traffic light:
| Severity | Think of it as | Meaning |
|---|---|---|
NONE | 🟢 green | all good, no problem found |
LOW | 🟡 light yellow | a small concern, probably fine |
MEDIUM | 🟠 orange | a real problem worth a look |
HIGH | 🔴 red | a hard violation — this is wrong |
Two simple rules:
- The overall result is the worst light, not the average. One red light means the whole thing is red, even if everything else is green. (A cake with one poisonous ingredient isn’t “mostly fine.”)
- A red light cancels everything else. Even if the code passes all its normal tests, a red physics check drops the score to zero. In science, being physically correct comes first — passing the tests isn’t enough if the physics is broken.
The engine page shows the exact arithmetic, but the idea is just those two rules.
Where everything lives (and which page teaches it)
Everything is in one open code repository. Here’s the map and where to learn each part:
- Cards & the rules they follow → Cards, completely
- The checker → The engine, completely
- Letting your AI tool call Lemma → The MCP server, completely
- Using Lemma from Python → The Python SDK, completely
- Making a real AI model more correct → Putting it together
You can read them in any order, but top-to-bottom is the gentlest path. Take the next page whenever you’re ready.