Open AI infrastructure
DeepSahel AI builds open infrastructure that takes African languages from raw speech and text data to reusable datasets, validated models, and open benchmarks — lowering the cost of entry for every team that comes after.
Development pipeline
2,000+
Languages spoken across the African continent
~30
With any meaningful AI tooling today
Talent and compute are part of the picture. But even teams that have both still face a missing layer between raw language resources and usable models — data workflows, tokenization, evaluation, native-speaker review, documentation. DeepSahel makes that layer reusable.
Bottlenecks we address
Four integrated components that move a language project from raw data to open release.
DS · Data
Speech and text collection pipelines, consent capture, metadata schemas, cleaning, normalization, enrichment, and dataset versioning built for low-resource language realities.
DS · Eval
Human-in-the-loop review interfaces for ASR listening, translation adequacy scoring, terminology validation, and naturalness rating — at every stage of the model cycle.
DS · Train
Guided workflows for ASR, translation, tokenizers, embeddings, and efficient foundation-model adaptation. Training recipes built to be reproduced and extended.
DS · Commons
Every dataset, model, benchmark, and recipe published openly on GitHub and Hugging Face with full documentation — designed for reuse, not just citation.
Current languages
Published before this grant. Infrastructure already underway.
DeepSahel builds what makes downstream applications possible — not the applications themselves.
AI Engineer · Project Lead & Model Training / Platform Engineering Lead
ML Researcher & Applied Mathematician · Model Optimization and Evaluation Lead
Kenya · Non-profit CLG
If you're building with African languages — as a researcher, developer, community organization, or downstream builder — we want to hear from you. All infrastructure will be open. Our goal is to reduce duplication across the ecosystem, not add to it.
Email us ↗