Press & Media
Press kit and media resources.
Founder bio, technical claims summary, and downloadable assets for journalists, researchers, and analysts covering Bhala’s work on programmable embeddings.
Technical claims
What Bhala has built, stated precisely.
Each claim below is documented on the benchmarks page with methodology, dataset references, and reproducibility notes. Citations appreciated.
15M-parameter encoder pretrained on a single language transfers across 17
The Bhala encoder is pretrained on ~40M tokens of isiZulu in roughly one hour on a laptop-class GPU. On MASSIVE 60-way intent classification, using only a linear probe on the frozen encoder (the field's gold-standard representation-quality test), it reaches 73.2% on Swahili (above GPT-4o zero-shot at 70.6%), 72.5% on Korean, 69.7% on Hindi, and 66.5% on Amharic — none of these languages were in the pretraining corpus. The encoder is frozen at evaluation time; only the input pipeline is adapted per language using a small monolingual sample (no labels, no fine-tuning). The linear probe — the strictest version of the gold-standard test, with no nonlinear capacity — clears 38–43× over random across all four languages and outperforms a 2-layer MLP probe on the three cross-family transfers. To our knowledge, no other published model satisfies this exact test condition (frozen + linear + zero target-language pretraining) on typologically distant languages. Source: Mhlambi 2026, “Structure is all you need.”
Operator transfer at 95–100% on held-out test data
A sentiment direction estimated from contrastive pairs in one language applies zero-shot to others without re-estimation. Tested pairs to date: Zulu→Swahili (100%), Zulu→Xhosa (95%). Intent operators (book→cancel, alarm→calendar) transfer at 100% on tested pairs. Operators compose, invert, and produce a signed audit record per application. Cross-family transfer of the same operator class is in progress.
Architectural inductive bias accounts for ~80 percentage points of accuracy
A standard pipeline approaches random performance on this task. Sozisi's architecture closes that gap to production-grade accuracy at 15M parameters. The contribution of each architectural component is documented in the technical paper, available on request.
100% strict-flip on 28 protected dimensions across canonical fairness benchmarks
BBQ, StereoSet, CrowS-Pairs, and WinoBias. 15,966 sentence pairs covering age, disability, gender, nationality, physical appearance, race, religion, sexual orientation, socioeconomic status, and more. An independently-trained classifier accepts every shifted embedding as belonging to the anti-stereotype class. Methodology and per-benchmark detail published on the benchmarks page.
Hate-speech production: 11 corpora, HateCheck 0.90, TweetEval-hate 0.77 at 15M parameters
Bhala v7 + LoRA adapter (15M frozen backbone + 528K trainable parameters) is trained jointly on 11 hate-speech corpora — HateCheck, CONAN, Civil Comments, Berkeley MHS, SBIC, DynaHate, TweetEval-hate, TweetEval-offensive, HateXplain, Stormfront, Hate-Speech-18 — for ~134K labeled examples. HateCheck AUROC 0.90 matches/beats HateBERT (110M, 0.85–0.88) and HateXplain BERT (110M, 0.83) at 7× fewer parameters. TweetEval-hate AUROC 0.77 — Twitter performance without any Twitter pretraining. Live in production on the Bhala labeler since 2026-05-02.
CPU-deployable: ~24MB quantized, <50ms inference
Quantized Sozisi runs on consumer CPU with sub-50ms single-query latency. No GPU required for inference. Designed for edge deployment in low-resource settings, but the same model serves the hosted API.
Assets
Logos and photos.
Right-click to save, or contact press@bhala.ai for additional formats.
Press inquiries
For interviews, technical briefings, or independent reproduction access, write to press@bhala.ai. We respond within two business days.