spot_img
spot_imgspot_img
May 14, 2026 - 10:39 PM

Google Opens WAXAL to Support Over 100 Million African Language Speakers

Google has launched WAXAL, a large open speech dataset designed to support the development of artificial intelligence tools for African languages. 

The dataset, which took over three years to develop, aims to address the shortage of high-quality speech data that has limited the use of voice-based technologies across much of Sub-Saharan Africa.

WAXAL contains speech data for 21 African languages, including Hausa, Yoruba, Igbo, Swahili, Luganda, and Acholi. According to Google, the dataset is intended to support more than 100 million speakers whose languages are largely absent from existing speech recognition and voice synthesis systems.

The dataset includes more than 11,000 hours of speech recordings drawn from nearly two million individual audio samples. Of this total, about 1,250 hours are fully transcribed natural speech, which can be used to train automatic speech recognition systems. In addition, the dataset contains over 20 hours of studio-quality recordings suitable for text-to-speech voice generation.

Google said WAXAL was developed in partnership with African universities and research organisations. Makerere University in Uganda and the University of Ghana led data collection for 13 languages, while Digital Umuganda in Rwanda coordinated work on five languages. Professional studio recordings were produced with support from Media Trust and Loud n Clear, while the African Institute for Mathematical Sciences (AIMS) contributed multilingual data for future releases.

Unlike many global speech datasets, ownership of the collected data remains with the African institutions that produced it. Google said this structure is intended to ensure that local researchers and developers can independently build tools.

“The ultimate impact of WAXAL is the empowerment of people in Africa,” said Aisha Walcott-Bryantt, Head of Google Research Africa. “This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology in their own languages and reach over 100 million people.”

Speech data was collected by asking volunteers to describe images in their native languages, a method intended to capture natural patterns of everyday speech. High-quality studio recordings were produced by professional voice actors to support realistic text-to-speech applications.

At the University of Ghana, more than 7,000 volunteers contributed voice samples to the project. Isaac Wiafe, an Associate Professor at the university, said the dataset could support innovation in education, healthcare, and agriculture.

“For AI to have a real impact in Africa, it must speak our languages and understand our contexts,” said Joyce Nakatumba-Nabende, a Senior Lecturer at Makerere University. “WAXAL gives researchers access to the quality data needed to build speech technologies that reflect our communities.”

The full WAXAL dataset is released under an open license and is now publicly available on the Hugging Face platform. Google said the dataset is intended for use by researchers, developers, startups, and public institutions working on speech-enabled technologies across Africa.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share post:

Subscribe

Latest News

More like this
Related

Illegal Miners Under Pressure as Mining Marshals Intensify Operations in Nasarawa, Plateau

The Federal Government has intensified its crackdown on illegal...

Why Entrepreneurs Are Choosing Flights With Elon Musk’s Starlink

In recent days, discussions about satellite internet service Starlink...

It’s Dead on Arrival— Wike Dismisses Makinde’s 2027 Presidential Bid

Minister of the Federal Capital Territory (FCT), Nyesom Wike,...

2027: Makinde Declares for President as PDP Faction Joins Forces with APM

A major opposition coalition may be emerging ahead of...
Join us on
For more updates, columns, opinions, etc.
WhatsApp
0
Would love your thoughts, please comment.x
()
x