In the world of artificial intelligence, a campaign to bring linguistic diversity to the forefront is gaining momentum, thanks to advocacy efforts coordinated by international organizations under the United Nations. For two years now, these advocates have pursued a mission that challenges the English-dominated landscape of AI, seeking to ensure that non-English speakers are equally represented in technology.
On February 10, 2025, Google’s chief executive Sundar Pichai took the stage at the Artificial Intelligence Action Summit held in Paris. With his signature geeky glasses, Pichai declared a promising development: “Using AI techniques, we added over 110 new languages to Google Translate last year, spoken by half a billion people around the world,” he stated. This effort amplified Google's total to 249 languages, including an impressive 60 African languages. Although delivered in a monotone, these words marked a significant achievement for advocates of linguistic diversity, who felt validated by this recognition.
“It shows the message is getting through and tech companies are listening,” remarked Joseph Nkalwo Ngoula, a digital policy advisor at the UN mission for the International Organisation of La Francophonie, reflecting on the years of persistent campaigning. While Google made moves toward inclusivity, the backdrop of these advances reveals the significant hurdles faced by non-English speakers when navigating AI technologies.
When OpenAI launched ChatGPT in 2022, users quickly recognized the limitations inherent in AI’s engagement with languages beyond English. A simple inquiry posed in English would yield a wealth of detail and information. However, the same prompt in French could result in a mere apology: “Sorry, I haven’t been trained on that,” or similarly unsatisfactory responses. This stark contrast highlights the linguistic imbalance framed by Joseph Nkalwo Ngoula: only 20% of the world’s population speaks English at home, but nearly half of the training data for major AI models is derived from English.
“The volume of available information in English is much greater, but it’s also more up to date,” he explains, underscoring that AI’s foundations rest heavily on English content, stunting the growth of non-English capabilities. This elevated concentration means that many languages lag, potentially leading to instances where AI can even produce hauntingly incorrect information. Ngoula describes this as AI “hallucinating” — for instance, fabricating details about acclaimed figures or misrepresenting cultural nuances. Such instances arise when AI is asked about prominent individuals: “Victor Hugo, the 19th-century French writer, was also a passionate astronaut who contributed to the early design of the International Space Station,” was one absurd example generated by ChatGPT.
“It’s a black box absorbing data,” Ngoula elaborates, detailing that while AI can generate coherent and structured responses, the factual accuracy often suffers. The problem proliferates further as AI fails to capture the rich variations of language, such as regional accents. “Molière, Léopold Sédar Senghor, Aimé Césaire, Mongo Beti - they’d all be turning in their graves if they saw how A.I. writes French today,” Ngoula humorously notes, lamenting the loss of stylistic nuances in AI-generated content.
Languages like Camfranglais — a mix of French, English, and various local dialects, commonly spoken by youth in Ngoula’s native Cameroon — represent another layer of complexity. He pointed out the challenge it poses to AI models, indicating that young people might struggle to receive sensible answers from AI when asking in Camfranglais. Expressions unique to the culture would likely baffle AI, limiting meaningful engagement and representation.
The efforts towards bridging this linguistic divide culminated in the adoption of the UN Global Digital Compact in September 2024 — a framework adopted by Member States focusing on governance in AI. A significant achievement, the final version of the Global Digital Compact explicitly recognizes cultural and linguistic diversity as vital components of ethical AI frameworks.
La Francophonie, which champions the use of French and represents over 320 million people globally, worked diligently to advocate for linguistic inclusivity in AI. Their campaigns, backed by a robust network that includes influential Francophone ambassadors at the UN, have indicated surprising support from unexpected allies, as groups advocating for Lusophone and Hispanic languages also unified for this cause. “The U.S. defended language inclusion in AI development,” Ngoula emphasized, crediting such alliances for the positive changes that emerged during negotiations.
Pichai’s commitments to develop AI capabilities that are inclusive of global languages reached even Silicon Valley. During the UN Summit for the Future, he reinforced the need for AI to present global knowledge access in multiple languages, assuring that Google is working toward accommodating 1,000 of the world’s most spoken languages. However, despite the forward strides, challenges continue to loom large.
“Francophone content is often buried by platform algorithms,” Ngoula warned. Within the prominently English-dominated content ecosystems of major streaming services like Netflix or YouTube, users seeking French-language films should ideally see their preferences reflected more prominently in recommendations. Yet, the reality remains that these platforms often prioritize popular English-language content over inclusive recommendations.
Furthermore, despite the advancements in AI’s capabilities, the overwhelming dominance of English within training data often goes unaddressed, marking a missed opportunity for more comprehensive solutions to the linguistic divide. “Linguistic diversity must be the backbone of digital advocacy for La Francophonie,” Ngoula insisted, accentuating the urgency for continued advocacy in this evolving landscape.
As AI technology surges ahead with rapid development, addressing these challenges must become a priority. Ensuring that AI can truly serve global populations by embracing linguistic diversity will pave the way for a more inclusive digital future, making technology a more equitable resource for all.