Driving Linguistic Diversity In AI: A Quest For Inclusivity

In a world increasingly dominated by English, a bold international campaign is reshaping the landscape of artificial intelligence (AI) to embrace linguistic diversity. For two years, La Francophonie, a collaborative network of 93 French-speaking states and governments, has been advocating for the inclusion of multiple languages in AI technologies, striving for a more equitable representation of voices from around the globe. This push for diversity aims to remedy a longstanding bias that has left non-English speakers at a disadvantage in accessing AI applications.

On February 10, 2025, during a pivotal moment at the Artificial Intelligence Action Summit in Paris, tech leader Sundar Pichai made a noteworthy announcement. The CEO of Google revealed that his company had added over 110 new languages to Google Translate the previous year, boosting the total number to 249 languages, including 60 African languages. This news, while celebrated, also underscored the extensive journey still ahead. As tech mogul Pichai stated, "We’re working toward 1,000 of the world’s most spoken languages," highlighting Google's ongoing commitment to improve linguistic inclusivity in AI tools.

Joseph Nkalwo Ngoula, a digital policy advisor at the UN mission of La Francophonie, noted the importance of this moment. "It shows the message is getting through and tech companies are listening," he asserted, reflecting on the shifts occurring in digital diplomacy. The recognition of linguistic diversity in the framework of AI governance marked a progress after intense lobbying efforts, culminating in the adoption of the UN Global Digital Compact in 2023.

However, the path to achieving true linguistic representation in AI has been fraught with challenges. Following the launch of OpenAI's ChatGPT in 2022, many non-English speakers encountered severe limitations in AI's capabilities. Users submitting queries in languages other than English often received frustratingly inadequate responses, with AI returning apologies or vague replies, clearly indicative of its English-centric training data. Nkalwo Ngoula pointed out, "The volume of available information in English is much greater, but it’s also more up to date." AI systems predominantly trained in English suffer from a lack of nuanced understanding of other languages, leading to inaccuracies and misunderstandings.

The phenomenon of AI "hallucinations" illustrates this issue clearly—erroneous or nonsensical outputs produced with alarming confidence. For instance, a seemingly educated response from an AI could involve a fabricated achievement like inventing a Nobel Prize for a historical figure. "It's a black box absorbing data," Nkalwo Ngoula explained, emphasizing the importance of diversifying the training data to include vast linguistic varieties that enrich cultural expressions.

Language variations can be particularly pronounced in multilingual societies. In Cameroon, where multiple languages and dialects intertwine, the divergence in communication styles complicates AI's ability to respond effectively to regional vernaculars. Nkalwo Ngoula humorously remarked, "I doubt young people could ask an AI something in Camfranglais and get a meaningful response." This highlights the ongoing struggle faced by regional dialects and colloquial languages as AI deployments continue to misunderstand their richness and complexity.

Through the lens of La Francophonie, these discussions extend beyond French-speaking nations; they encompass a broader dialogue around linguistic representation that includes not just Francophones but also groups advocating for other languages. The alliance with legislative bodies and cultural organizations, including those representing Spanish and Portuguese-speaking populations, exemplifies a united front for promoting diversity in AI tools. In a surprising twist, the U.S. has even supported initiatives focused on language inclusion, showing a growing awareness of the issue.

The UN Global Digital Compact emerged as a pivotal agreement addressing the rampant inequality in digital representation, and it prominently recognizes the need for cultural and linguistic diversity. Nkalwo Ngoula shared, "Our goal was to bring it to the forefront," reflecting on the group's efforts in prioritizing language mention in governance discussions that often neglect this essential aspect of human identity. However, as industries push onward, the Compact leaves certain limitations unaddressed—mainly the algorithms used on platforms that frequently favor dominant languages and cultures.

Despite advancements following these efforts, obstacles persist. Content recommendations on streaming platforms like Netflix often favor English-language materials, thereby sidelining cultural content from other languages. Nkalwo Ngoula warned, "Francophone content is often buried by platform algorithms," echoing sentiments across different cultures sharing similar battles for visibility in a largely English-dominated digital landscape.

Moreover, the Compact notably failed to incorporate UNESCO’s Convention on Cultural Diversity, an oversite critics argue needs urgent rectification. "Linguistic diversity must be the backbone of digital advocacy for La Francophonie," Nkalwo Ngoula insisted, urging that as AI technologies advance, these principles must be aligned with broader advocacy for cultural representation.

As the discussion about linguistic diversity in AI continues to gain traction globally, it also finds echoes in regions such as India. There, the interplay between migration, globalization, and historical challenges has led to a significant re-examination of cultural and linguistic identities. Renowned linguist Peggy Mohan, in her book "Father Tongue, Mother Land," explores how the evolution of languages is intricately tied to shifts in societal norms and reflects deeper narratives of power and belonging in Indian contexts.

Mohan's exploration emphasizes how languages adapt and change over time, offering insights into survival and identity. As Mohan asks, "How do languages survive or fade? What do they reveal about identity, belonging, and change?" These questions resonate on a wider scale, urging individuals and organizations globally to reflect on their digital expressions as influenced by language and culture.

With the future of AI lying at the intersection of technology and linguistic equity, the dialogue surrounding these issues has never been more critical. Both the advancement of AI technologies and efforts from organizations like La Francophonie will pave the way for a more inclusive digital era that respects and promotes the diversity inherent in human expression.

Driving Linguistic Diversity In AI: A Quest For Inclusivity

International organizations push for greater representation of languages in AI technologies to serve a diverse global population.