Saving endangered languages
‘When the last speaker falls silent, we lose something irreplaceable’
The Hindu
A language is the “living memory of a people, encoded in sound”, says a volunteer-led global initiative that is using an approach known as “frugal AI” to document and preserve the more than 40% of the world’s languages that are at risk of disappearing.
Doing that without the exploitative data extraction practices of big tech companies means not using AI as it has been in North America, where huge volumes of data are being vacuumed up for purposes of training AI models.
“The current trajectory of AI development is unsustainable economically, environmentally, and socially,” says Arjuna Sathiaseelan, founder of the Saving Voices Project nonprofit, and chief technology officer of the Frugal AI Hub at Cambridge University. “Model sizes have exploded, leading to significant energy and water consumption, and yet billions of people remain excluded from AI’s benefits. Frugal AI addresses these failures.”
“For communities with a long history of having their land and cultural materials extracted by outsiders, that is a deal breaker,” says co-founder Sarabani Banerjee Belur, an assistant professor at the Indian Institute of Information Technology, Dharwad.
“Our aim is creating community data sovereignty and a scalable blueprint for language restoration and digital agency by bridging the gap between cutting-edge AI and Indigenous oral traditions.”
The Saving Voices Project, which aims to reach nearly 500 million Indigenous people in 90 countries, began with developing a frugal AI model for the Soligas, one of the oldest indigenous communities living in the forests of Karnataka, India, whose language carries unique ecological and spiritual knowledge that cannot be translated.
Known as “children of bamboo”, they have called the Biligirirangana Hills home for generations. In 2011, under the Forest Rights Act, they became the first tribal community in India to have their community and individual rights legally recognised within the core of a tiger reserve - an affirmation of their integral role in the ecosystem.
So they were a perfect place to develop the model for this global Indigenous language preservation project.
Their sustainable, indigenous conservation and lifestyle practices highlight co-existence with tigers, traditional ecological knowledge, and a unique life-stage measurement based on 35-60 year bamboo flowering cycles, rather than chronological age.
Working with community members, the team developed a frugal voice data collection pipeline, recording natural speech, oral traditions, and daily language use, producing a richly annotated dataset for AI model training.
WACC
Community champions were trained as voice data collectors. “Using the collected data, we built lightweight ASR and TTS models specifically designed to run on low-powered devices — ensuring the technology is accessible to community members regardless of infrastructure constraints.” The voice data never left community devices, which “isn’t possible with closed cloud systems,” says Sathiaseelan.
The promise of Small AI
“The promise of AI should not be a luxury for just a few nations,” suggests an article in the World Bank blog in the fall of 2025 by Sangbu Kim and Christine Zhenwei Qiang. “Small AI is revealing a new narrative: One of resilience, ingenuity, and opportunity, born in the very communities that need it most. And its most exciting chapters are still to come.”
It is affordable, accessible, and context specific, flourishes on smaller datasets, runs on everyday smartphones or laptops, uses minimal resources, and is fine-tuned to address immediate, local challenges, they say.
“While it may have limits in scale, it enables developing countries to leapfrog traditional barriers and harness AI today. At the World Bank, we see this as a vital bridge to a more inclusive digital world tomorrow.”
Small AI is delivering robust, low bandwidth tools that expand access and are tailored to local needs, such as mobile-based AI tools that screen for tuberculosis and diabetic conditions directly on handheld devices, with no broadband connection required. An initiative in Peru is developing voice-based diagnostics in Indigenous languages, building community trust in health care and ensuring technology serves everyone.
UNESCO
“In Ghana, the “Rori” AI math tutor—sent via WhatsApp and trained on 500 micro-lessons—costs about $5 per student per year but produces learning gains equal to an extra year of schooling. In Costa Rica, the Dominican Republic, and Mexico, AI tutoring systems are extending personalized learning to remote and Indigenous communities. Meanwhile, platforms like India’s Diksha and Bangladesh’s Shikkhok embed AI tools into mobile applications that function offline and in multiple languages.”
They say Small AI works best when it tackles hyper-local, clearly defined problems; builds on existing infrastructure and networks; is designed for mobile-first, offline functionality is crucial; and thrives on public-private partnerships, where governments provide enabling platforms, the private sector drives innovation, and communities shape solutions that truly work on the ground.
The future will be “spoken”
The man who developed India’s national ID system has been pushing for a more democratic approach to AI, advocating for smaller, open-source models trained on high-quality data vetted by humans rather than massive systems run by a few powerful players. And that matters in India, with its many languages, because he believes the future “will be spoken.”
Nandan Nilekani is the architect of Aadhaar, India’s national ID system, which allows 1.4 billion Indians to access digital services from banking and health care to work programs and tax services. He says AI’s future will be shaped by small systems designed to resolve real-world problems. “If a farmer in Bihar can speak to a computer in Maithili or Bhojpuri or whichever language and gets the right answer, you have made AI so much more accessible to him, “ he says.
The Soliga voice dataset is being made freely available to researchers, educators, and community members globally. “Our Soliga methodology is designed to be replicated across any indigenous language community globally,” the project says. “The tools, workflows, and models we developed provide a scalable blueprint for language preservation at scale.”
Another example of the value of frugal AI is in agriculture.
When Catherine Nakalembe set out to map crop types in western Kenya, she had plenty of data from satellite images, but couldn’t use artificial intelligence to analyze it because the data could not recognize local crops, says Rest of the World.
So she fitted GoPro cameras on the helmets of dozens of volunteers, and trained the facial recognition technology to identify maize, beans, and cassava. They collected over 5 million images in two weeks.
She uses machine learning, computer vision, and deep learning models to map cropland, classify crop types, and estimate yields in Uganda, Kenya, Senegal, and other African nations. But most AI models are trained on European and U.S. data and are largely useless unless they are adapted for local contexts, she says.
“AI systems built in the West often also fail to account for the contexts of the Global South, including high internet costs, limited bandwidth, and a lack of labeled training data,” said Nakalembe, an assistant professor at the University of Maryland, and Africa program director at NASA Harvest, which uses satellite imagery to improve agricultural production.
“If these systems aren’t adapted, they remain irrelevant, potentially deepening existing inequalities in wealth and access to resources, [and] there is a risk that these systems prioritize corporate and company profit over farmers,” she told Rest of World.




