Bhala research

The first AI you can program like software.

Every other AI works the same way: it guesses, and you hope it guessed right — no hallucination, no hidden bias. Bhala is a new kind of AI. You issue a command, and the math proves it executed exactly as you intended — output you can verify, not output you have to trust.

Remove bias and prove it's gone. Call reasoning like a function and inspect it. Audit any decision — every output carries a signed operator trace.

See the evidence Request a briefing

An embedding model built from scratch — not a chatbot, not a fine-tuned LLM.

What becomes buildable

The class of models today's AI architecture cannot produce.

Four capabilities the black-box approach structurally cannot produce.

AI you control like software, not prompts

100%

operator execution verified

Prompts are suggestions — you hope the model does what you want. Bhala executes commands: tell it exactly what to change, it changes it, and the math confirms it happened. Same command, same result, every time.

AI that proves it's fair — by construction

28 / 28

Recognizes and removes bias across 28 demographic categories — by construction, not post-hoc filtering. When a regulator asks "did your AI discriminate?", Bhala produces a mathematical answer. EU AI Act, NYC LL144, Michigan DIFS: answerable by design.

AI that handles situations it's never been trained on

9 / 9

logical axioms verified

Teach it one pattern and it understands the reverse, the combination, and the missing piece — without being shown each case explicitly. It learns structure, not just examples. Every logical operation is verifiable.

AI that fits where the big models can't go

100,000×

fewer parameters than GPT-4o

Smaller than the models it outperforms. Runs on a laptop or a $75 phone — no GPU, no cloud. Hospitals, courtrooms, banks, and governments that can't send data to a cloud API can run it on-premise — today.

Empirical receipts

Six measurements that show the paradigm transmits.

Every number is reproducible from public datasets. No fine-tuned probes hidden behind a sales call. No hand-curated demos. The receipts are the product.

F&P closure

100%

compose · invert · decompose verified on the embedding manifold

ZFC set-theoretic axioms

9 / 9

union · intersection · difference · powerset · extensionality · separation · choice · commutativity · De Morgan

Counterfactual constructor

cos 0.91

cross-domain generalization on 4 held-out regulated domains (criminal sentencing, pain management, promotion review, public benefits)

Bias axes

28 / 28

BBQ + StereoSet + CrowS-Pairs + WinoBias · 15,966 sentence pairs · zero failures

Zero-shot cross-lingual intent

73.2%

MASSIVE 60-intent benchmark · frozen 15M-param backbone · zero target-language data · above GPT-4o (70.6%) on a held-out language the model was never trained on

Effective parameters

15M

On-CPU deployment. Compositional generalization at a parameter count 100× smaller than frontier models.

Full benchmark methodology and per-axis breakdowns

The problem

The black-box wall.

Every frontier AI deployment hits the same wall: the model is a black box, you can't tell why it made a decision, and you can't selectively change its behavior without retraining.

Compliance teams demand decision-level audit. Engineers ship without it. The mismatch is widening as the EU AI Act, NYC LL144, Michigan DIFS bulletin, and EU DSA all come into force in 2026 — each requiring per-cohort, per-decision evidence that black-box models cannot produce.

Bhala fixes this by changing what an embedding is. Same or better accuracy — plus the ability to control, verify, and audit every decision. Not a tradeoff. A different mathematical object that does more.

How we did it

Why now

Compliance is forcing the issue.

The EU AI Act high-risk obligations land August 2026. Frontier-AI labs are publishing safety frameworks. Compliance teams need machine-verifiable per-cohort fairness and signed audit trails on every decision. Black-box embeddings cannot meet any of these requirements.

Programmable AI provides the missing primitive: operators that act on the embedding space with verifiable results. Bias becomes a direction. Composition becomes a function. Counterfactual fairness becomes a constructor that generalizes to new domains.

Working today

A 15M-parameter encoder with verified operator algebra, 100% bias-axis correction across 28 canonical fairness dimensions, and zero-shot cross-lingual transfer to languages it has never seen — no retraining required.

Read the work, then talk to us.

We're raising a scientific seed and selectively engaging research collaborators. Every claim above is empirically reproducible.

Programmable AI Full benchmark page Contact