Compose intelligence for any use case
Language, translation, edge, sovereign, interpretable. Every use case composes the same five core units. Own every layer.
What customers actually deploy this for
Four production use cases — each a different buyer, a different continent, the same underlying stack.
Scenario 1 — Financial services · EU
A European bank needs its CV-screening pipeline to pass an EU AI Act audit before go-live.
The bank's legal team has 60 days to show regulators that candidate ranking does not discriminate on gender or age. The compliance team has no ML budget and no labeled fairness data.
Without Bhala
- · Commission a third-party bias audit (~€40K, 8 weeks)
- · Get a PDF report — no way to act on findings in production
- · Either pull the product or ship it with known risk
- · No signed record to show the regulator
With Bhala
- · CVs and job descriptions go through Bhala's encoder — Bhala is the embedding layer
- · The bias operator removes gender and age signal from candidate embeddings before ranking
- · Every ranking decision carries a signed audit receipt showing which operators ran
- · Hand the regulator a machine-readable log, not a PDF
Scenario 2 — Healthcare · United States
A US health system wants to detect hate speech in patient intake notes — across 14 languages — without sending data to a cloud API.
HIPAA forbids sending patient text to third-party APIs. The patient population speaks Spanish, Haitian Creole, Somali, Vietnamese, and 10 other languages. Staff file incident reports inconsistently. The compliance officer wants automated flagging before a note reaches a clinician.
Without Bhala
- · Off-the-shelf classifiers cover English only — or require cloud calls
- · Building per-language models requires labeled data that doesn't exist
- · On-premise deployment of a 7B+ model requires GPU infrastructure
- · No solution ships in under 6 months
With Bhala
- · 15M-parameter model runs entirely on-premise — no data leaves the network
- · Zero-shot across all 14 languages with a single frozen model
- · 81–98% catch rate per target group at a 5% false-positive rate
- · Fits on a standard clinical workstation CPU — no GPU needed
Scenario 3 — Content moderation · Global platform
A social platform needs intent classification across 40+ languages without training a model per language.
The platform has 200M users across Southeast Asia, the Middle East, and Sub-Saharan Africa. Their current English-only intent classifier routes abuse reports to the wrong queue 60% of the time for non-English posts. Hiring labelers for 40 languages is not viable.
Without Bhala
- · Translate everything to English first — adds latency, loses nuance, costs $0.06/1K chars at scale
- · Or fine-tune 40 separate models — 40× the training cost, 40× the maintenance surface
- · Translate-then-classify misroutes culturally-specific abuse that doesn't map to English categories
With Bhala
- · One model, 40+ languages, no translation layer
- · Intent redirect operator re-routes posts at inference time — no retraining per new category
- · 73% intent accuracy on Swahili zero-shot, beating GPT-4o (70.6%) at 1/100,000th the size
- · Sub-50ms on commodity hardware — fits inside existing CDN edge nodes
Scenario 4 — Feed middleware · AT Protocol / Fediverse
A social platform wants to let users control their own feed — not just mute words, but dial down toxicity, political content, or outrage bait as a personal preference.
The platform has committed to user-sovereignty over algorithmic feeds. Users want sliders, not keyword lists. The trust & safety team wants every moderation action to be auditable and reversible — including user-applied ones.
Without Bhala
- · Keyword filters — gameable in seconds, no semantic understanding
- · Per-user fine-tuned models — not feasible at millions of users
- · Platform-level moderation only — users have no meaningful control
- · No audit log: impossible to explain to a user why a post was suppressed
With Bhala
- · User preferences map to named controls —
less_outrage,less_political,more_local - · Controls applied to post embeddings at query time — one model, every user, no retraining
- · Fully reversible: users can inspect and undo any active control
- · Every suppression carries a signed receipt — AT Protocol compatible, GDPR explainability ready
Drop it between your embedding and your retrieval.
Drop the operator layer into your existing embedding pipeline — ranking, retrieval, classification, or recommendations. One REST call returns a shifted embedding and an audit receipt. No rewrites, no retraining.
Enterprise RAG
Every query gets an audited pass before retrieval. Compliance teams get the receipt; hallucinations drop on sensitive prompts.
Examples · Glean · LangChain hosts · vertical AI
Semantic search
Users or regulators apply named dimensions (positive sentiment, verified sources, less advertising) to the query before it hits the index.
Examples · Kagi · Perplexity · legal / medical search
Feed ranking & moderation
User-owned feed dimensions and moderator-owned toxicity flagging, each with a signed log per action.
Examples · Bluesky · Mastodon · forums · comments
Recommendation
Apply `safe_for_kids`, `less_political`, or `more_diverse_authors` as overrides on a user vector. Reversible per request, explainable per result.
Examples · Publishers · streaming · e-commerce
Intent routing
Redirect a user query from one intent region to another. Counterfactual retrieval without retraining, tested at 100% flip accuracy across four transitions.
Examples · Chatbots · Slack · Intercom · Zendesk
Bias remediation
Subtract a bias dimension from any embedding at inference, reversibly. Quantify before-and-after and prove it to an auditor.
Examples · HR tech · lending · underwriting · screening
Today the operator layer runs on our embedding space. Q3–Q4 extends it to wrap OpenAI, Cohere, and customer-hosted embeddings, so you don't replace your foundation model to make it governable.
The middleware that makes any embedding auditable.
Every embedding API is opaque. When a retrieval goes wrong, "the model did it" doesn't pass an EU AI Act review.
Drop a thin layer between your embedding call and your retrieval. Apply named actions (sentiment, intent, bias) and get back the shifted vector plus a signed audit record.
Target customers: Enterprise RAG platforms, regulated industries, AI-native social platforms, compliance-heavy search
AI you can inspect, audit, and defend in court
Regulators want to know why your model decided what it did. Your LLM gives them vibes and a confidence score.
With Bhala, every inference is a traceable sequence of operations. The full reasoning path is inspectable from input tokens to output decision.
Target customers: Financial services, healthcare, regulators, critical infra
Own the full intelligence stack. Weights, data, audit trail.
Your citizen data flows to foreign cloud providers. Every API call crosses a border. Regulators and adversaries both notice.
Run the full stack on your own infrastructure. Adapt to national languages, run on existing hardware, keep every inference auditable. Nothing leaves your borders.
Target customers: Governments, central banks, defense, regulated industries
High-performance intelligence on any device, fully offline
Your users are on feature phones and spotty networks. Cloud AI is not an option.
15M parameters, 24MB on disk (pre-quantization) — smaller than most app updates. Sub-50ms inference, fully offline. Intent, sentiment, and NER on-device. No data leaves the phone.
Composition
Target customers: Device OEMs, mobile fintech, health apps, IoT, defense
Translation without huge parallel corpora
Classical MT needs millions of aligned sentence pairs per language. That data doesn't exist for most of the world.
Cross-lingual operators move meaning between languages as geometric transformations. New languages attach with adaptation, not retraining.
Composition
Target customers: Localization teams, customer ops, public sector, NGOs
Native NLU for every language — even the ones nobody else serves
General-purpose LLMs hallucinate on most of the world's languages, charge cloud rates, and still miss. Fine-tuning is a six-month project.
Intent, sentiment, entities, and grammar across 23 languages out of the box. Zero-shot to 17+ more. No fine-tuning, no cloud dependency.
Target customers: Fintech, healthtech, global SaaS, content platforms
Integration
Composes with your existing stack
Core units are decoupled by design. Drop them next to the platforms you already use.
Cloud Platforms
- Azure OpenAI
- AWS Bedrock
- Google Cloud AI
Customer Service
- Zendesk
- Freshdesk
- Salesforce Service Cloud
Communication
- WhatsApp Business
- USSD gateways
- SMS platforms
Edge Platforms
- Android (Kotlin/Java)
- iOS (Swift)
- ONNX Runtime
Compose the intelligence you need.
Start with one core unit. License the full stack when you're ready for sovereign deployment.