Composable AI — Every Use Case

Compose intelligence for any use case

Language, translation, edge, sovereign, interpretable. Every use case composes the same five core units. Own every layer.

Talk to us See the architecture

5 core units

composable building blocks

15M params

any phone, any IoT

<50ms

on commodity hardware

Sovereign

on-prem, inspectable

Real scenarios

What customers actually deploy this for

Four production use cases — each a different buyer, a different continent, the same underlying stack.

Scenario 1 — Financial services · EU

A European bank needs its CV-screening pipeline to pass an EU AI Act audit before go-live.

The bank's legal team has 60 days to show regulators that candidate ranking does not discriminate on gender or age. The compliance team has no ML budget and no labeled fairness data.

Without Bhala

· Commission a third-party bias audit (~€40K, 8 weeks)
· Get a PDF report — no way to act on findings in production
· Either pull the product or ship it with known risk
· No signed record to show the regulator

With Bhala

· CVs and job descriptions go through Bhala's encoder — Bhala is the embedding layer
· The bias operator removes gender and age signal from candidate embeddings before ranking
· Every ranking decision carries a signed audit receipt showing which operators ran
· Hand the regulator a machine-readable log, not a PDF

Scenario 2 — Healthcare · United States

A US health system wants to detect hate speech in patient intake notes — across 14 languages — without sending data to a cloud API.

HIPAA forbids sending patient text to third-party APIs. The patient population speaks Spanish, Haitian Creole, Somali, Vietnamese, and 10 other languages. Staff file incident reports inconsistently. The compliance officer wants automated flagging before a note reaches a clinician.

Without Bhala

· Off-the-shelf classifiers cover English only — or require cloud calls
· Building per-language models requires labeled data that doesn't exist
· On-premise deployment of a 7B+ model requires GPU infrastructure
· No solution ships in under 6 months

With Bhala

· 15M-parameter model runs entirely on-premise — no data leaves the network
· Zero-shot across all 14 languages with a single frozen model
· 81–98% catch rate per target group at a 5% false-positive rate
· Fits on a standard clinical workstation CPU — no GPU needed

Scenario 3 — Content moderation · Global platform

A social platform needs intent classification across 40+ languages without training a model per language.

The platform has 200M users across Southeast Asia, the Middle East, and Sub-Saharan Africa. Their current English-only intent classifier routes abuse reports to the wrong queue 60% of the time for non-English posts. Hiring labelers for 40 languages is not viable.

Without Bhala

· Translate everything to English first — adds latency, loses nuance, costs $0.06/1K chars at scale
· Or fine-tune 40 separate models — 40× the training cost, 40× the maintenance surface
· Translate-then-classify misroutes culturally-specific abuse that doesn't map to English categories

With Bhala

· One model, 40+ languages, no translation layer
· Intent redirect operator re-routes posts at inference time — no retraining per new category
· 73% intent accuracy on Swahili zero-shot, beating GPT-4o (70.6%) at 1/100,000th the size
· Sub-50ms on commodity hardware — fits inside existing CDN edge nodes

Scenario 4 — Feed middleware · AT Protocol / Fediverse

A social platform wants to let users control their own feed — not just mute words, but dial down toxicity, political content, or outrage bait as a personal preference.

The platform has committed to user-sovereignty over algorithmic feeds. Users want sliders, not keyword lists. The trust & safety team wants every moderation action to be auditable and reversible — including user-applied ones.

Without Bhala

· Keyword filters — gameable in seconds, no semantic understanding
· Per-user fine-tuned models — not feasible at millions of users
· Platform-level moderation only — users have no meaningful control
· No audit log: impossible to explain to a user why a post was suppressed

With Bhala

· User preferences map to named controls — less_outrage, less_political, more_local
· Controls applied to post embeddings at query time — one model, every user, no retraining
· Fully reversible: users can inspect and undo any active control
· Every suppression carries a signed receipt — AT Protocol compatible, GDPR explainability ready

Where it fits

Drop it between your embedding and your retrieval.

Drop the operator layer into your existing embedding pipeline — ranking, retrieval, classification, or recommendations. One REST call returns a shifted embedding and an audit receipt. No rewrites, no retraining.

Enterprise RAG

Buying now

Every query gets an audited pass before retrieval. Compliance teams get the receipt; hallucinations drop on sensitive prompts.

Examples · Glean · LangChain hosts · vertical AI

Semantic search

Natural fit

Users or regulators apply named dimensions (positive sentiment, verified sources, less advertising) to the query before it hits the index.

Examples · Kagi · Perplexity · legal / medical search

Feed ranking & moderation

AT Protocol

User-owned feed dimensions and moderator-owned toxicity flagging, each with a signed log per action.

Examples · Bluesky · Mastodon · forums · comments

Recommendation

EU DSA ready

Apply `safe_for_kids`, `less_political`, or `more_diverse_authors` as overrides on a user vector. Reversible per request, explainable per result.

Examples · Publishers · streaming · e-commerce

Intent routing

Production

Redirect a user query from one intent region to another. Counterfactual retrieval without retraining, tested at 100% flip accuracy across four transitions.

Examples · Chatbots · Slack · Intercom · Zendesk

Bias remediation

Model-risk

Subtract a bias dimension from any embedding at inference, reversibly. Quantify before-and-after and prove it to an auditor.

Examples · HR tech · lending · underwriting · screening

Today the operator layer runs on our embedding space. Q3–Q4 extends it to wrap OpenAI, Cohere, and customer-hosted embeddings, so you don't replace your foundation model to make it governable.

Flagship

The middleware that makes any embedding auditable.

Every embedding API is opaque. When a retrieval goes wrong, "the model did it" doesn't pass an EU AI Act review.

Drop a thin layer between your embedding call and your retrieval. Apply named actions (sentiment, intent, bias) and get back the shifted vector plus a signed audit record.

Composition

Programmable Behavior Sozisi Encoder Morpheme-Aware Tokenization

Your pipeline

query / user / post

Bhala operator layer

shifted vector + audit

Your retrieval

ranker, RAG, classifier

100% flip accuracy on sentiment and intent (held-out test data)

Every shift logged with operator id, parameters, timestamp, and result delta

Plugs into any embedding step — ours today, OpenAI / Cohere on the roadmap

Reversible per call — apply a negative coefficient to undo or debias

Sub-50ms latency on commodity hardware — no GPU required

Built for EU AI Act, SOC 2, and model-risk review out of the box

Target customers: Enterprise RAG platforms, regulated industries, AI-native social platforms, compliance-heavy search

Talk to us

Differentiator

AI you can inspect, audit, and defend in court

Regulators want to know why your model decided what it did. Your LLM gives them vibes and a confidence score.

With Bhala, every inference is a traceable sequence of operations. The full reasoning path is inspectable from input tokens to output decision.

Composition

Programmable Behavior Sozisi Encoder Morpheme-Aware Tokenization

Input

(with audit context)

Bhala Inspect

embedding + operator + receipt

Decision + Trace

(auditable, reproducible)

Inspectable reasoning trace on every inference

Human-readable tokens — not opaque sub-word fragments

Deterministic geometry, not stochastic token sampling

Compositional audit: replay any step

EU AI Act, model risk, and clinical use ready

Reproducible results across deployments

Target customers: Financial services, healthcare, regulators, critical infra

Talk to us

Enterprise

Own the full intelligence stack. Weights, data, audit trail.

Your citizen data flows to foreign cloud providers. Every API call crosses a border. Regulators and adversaries both notice.

Run the full stack on your own infrastructure. Adapt to national languages, run on existing hardware, keep every inference auditable. Nothing leaves your borders.

Composition

Sozisi Encoder Programmable Behavior Morpheme-Aware Tokenization On-Device Runtime Self-Healing Inference

Citizen Data

(stays in-country)

Bhala Stack

All 5 core units, on-prem

Your Infrastructure

(sovereign control)

On-premise or private-cloud deployment

Own the model weights — no vendor escape hatch

Adapt to national languages in under 2 seconds

No H100 clusters required — runs on existing hardware

Full audit trail for compliance reporting

Bring-your-own core units — no lock-in

Target customers: Governments, central banks, defense, regulated industries

Talk to us

Available

High-performance intelligence on any device, fully offline

Your users are on feature phones and spotty networks. Cloud AI is not an option.

15M parameters, 24MB on disk (pre-quantization) — smaller than most app updates. Sub-50ms inference, fully offline. Intent, sentiment, and NER on-device. No data leaves the phone.

Composition

On-Device Runtime Morpheme-Aware Tokenization

User

(any device)

Bhala Edge

2 composed core units

Your App

(fully offline)

24MB on disk (pre-quantization) — smaller than most app updates, vs 2GB+ for the smallest frontier models

<50ms inference on standard mobile hardware

Fully offline — no internet required

No data leaves the device — privacy by construction

Battery-efficient inference

Android, iOS, ONNX Runtime, embedded Linux

Target customers: Device OEMs, mobile fintech, health apps, IoT, defense

Talk to us

Core

Translation without huge parallel corpora

Classical MT needs millions of aligned sentence pairs per language. That data doesn't exist for most of the world.

Cross-lingual operators move meaning between languages as geometric transformations. New languages attach with adaptation, not retraining.

Composition

Sozisi Encoder Programmable Behavior

Source Text

(any supported language)

Cross-Lingual Operator

Bhala embedding shift

Target Text

(any supported language)

No massive parallel corpus required

English ↔ 23 languages in production

New language support in seconds, not months

Preserves cultural and grammatical nuance

Batch-throughput for enterprise workloads

Custom brand glossary composition

Target customers: Localization teams, customer ops, public sector, NGOs

Talk to us

Core

Native NLU for every language — even the ones nobody else serves

General-purpose LLMs hallucinate on most of the world's languages, charge cloud rates, and still miss. Fine-tuning is a six-month project.

Intent, sentiment, entities, and grammar across 23 languages out of the box. Zero-shot to 17+ more. No fine-tuning, no cloud dependency.

Composition

Sozisi Encoder Morpheme-Aware Tokenization Programmable Behavior

User Input

(any supported language)

Bhala NLU

3 composed core units

Your App

(intent, sentiment, entities)

Native NLU across 23 Bantu languages + 17 zero-shot

Beats GPT-4o on Swahili intent (73.2% vs 70.6%)

New SOTA on Zulu, Xhosa, and Sesotho (vs AfroXLMR-76L)

Deterministic inference — no hallucinations

Sub-50ms latency on commodity hardware

Works with Azure OpenAI, AWS, Google Cloud as a layer

Target customers: Fintech, healthtech, global SaaS, content platforms

Talk to us

Integration

Composes with your existing stack

Core units are decoupled by design. Drop them next to the platforms you already use.

Cloud Platforms

Azure OpenAI
AWS Bedrock
Google Cloud AI

Customer Service

Zendesk
Freshdesk
Salesforce Service Cloud

Communication

WhatsApp Business
USSD gateways
SMS platforms

Edge Platforms

Android (Kotlin/Java)
iOS (Swift)
ONNX Runtime

Compose the intelligence you need.

Start with one core unit. License the full stack when you're ready for sovereign deployment.

Talk to us See Pricing