Examples
The repo ships four runnable examples in
examples/ — each a
single Python file using the SDK against the bundled cards corpus. No database,
no API keys.
pip install -e ./sdk-pypython examples/verify_hypothesis.py # or any example below| Example | Shows | What it does |
|---|---|---|
verify_hypothesis.py | the cross-check engine | A well-formed vs a dimensionally-broken hypothesis → NONE vs HIGH. Walked through in Verify a hypothesis. |
browse_cards.py | corpus access | Load, count, filter, and read cards from the corpus. |
validate_card.py | schema validation | Parse a valid card; reject a malformed one with structured errors — the in-process version of the ajv-cli check. |
use_mcp_tools.py | the MCP tools | Call cards_list / cards_get / ops_get / hypothesis_crosscheck as plain functions — exactly what an MCP client invokes over the protocol. |
verify_llm_output.py | LLM in the loop | Ask a model (Ollama or any OpenAI-compatible endpoint) to propose a law, then run the engine on its output. Needs a model endpoint. |
usce_check.py | finished-output check | Range-check a finished result’s numbers against a card’s validation envelopes (within → pass, outside → HIGH). |
Test a model (Llama or any other)
To run a model through the verification, there are two paths:
-
Quick, self-contained —
examples/verify_llm_output.pyasks a model (local Ollama by default, or any OpenAI-compatible endpoint) to propose a law, then runs the engine on its output. No benchmark prompts needed:Terminal window ollama pull llama3.1:8b # have Ollama runningpython examples/verify_llm_output.py -
Full A/B benchmark — the eval harness scores a model control (alone) vs treatment (with Lemma tools) over the HumanEval-Sci prompt set:
Terminal window cd eval/humaneval-sciHUMANEVAL_SCI_PROMPTS_DIR=/path/to/prompts pnpm smoke-ab --ollama --model llama3.1:8bAdapters (Ollama, Gemini, Anthropic) live in
eval/humaneval-sci/runner/adapters/.
Not covered by these examples
rag_lookupneeds a Postgres + pgvector backend — see the MCP server page.- To run the server itself over MCP, or browse the corpus with the
lemmaCLI, see Quickstart and the Python SDK.