Today : May 06, 2025
Technology
06 May 2025

AI Language Models Favor Mainstream American English

The dominance of American English in AI systems raises concerns about inclusivity and representation.

An estimated 90% of the training data for current generative AI systems stems from English. However, English is an international lingua franca with about 1.5 billion speakers worldwide, and countless varieties. So whose English is today’s technology based on? The answer is primarily the English of mainstream America. This is no accident. Mainstream American English is entrenched in the digital infrastructure of the internet, in Silicon Valley’s corporate priorities, and in the data sets that fuel everything from autocorrect to AI-generated synthetic text.

The consequence? AI models produce a monolithic version of English that erases variation, excludes minoritised and regional voices, and reinforces unequal power dynamics.

The proliferation of American English online is a result of historical, economic and technological factors. The United States has been a dominant force in the development of the internet, content creation, and the rise of tech giants such as Google, Meta, Microsoft and OpenAI. Unsurprisingly, the linguistic norms embedded in products by these companies are overwhelmingly mainstream American.

A recent study found that speakers of non-mainstream English were frustrated with the “homogeneity of AI accents” in voice-cloning and speech-generation technologies. One participant noted the predominant mainstream American accents in the voices available, stating the technologies had been built “with some other people in mind.”

Mainstream varieties of English have long reigned as the “standard” against which other varieties are weighed. To take a single example from the US, linguistics research by John Baugh found that using different accents can determine people’s access to goods and services. When Baugh called different landlords about housing advertised in the local newspaper, using a mainstream accent procured him several housing inspections while using African-American and Latino accents did not.

The prestige of mainstream English also underpins algorithmic decisions. The models behind tools such as autocorrect, voice-to-text, or even AI writing assistants are most often trained on mainstream American-centric data. This is often scraped from the web, where US-based media, forums and platforms dominate. This means variations in grammar, syntax and vocabulary from other varieties of English are systematically ignored, misinterpreted or outright “corrected.”

The stakes of this linguistic bias in favour of mainstream English become even higher when AI systems are deployed around the world. If an AI tutor fails to understand a Nigerian English construction, who bears the cost? If a job application written in Indian English is marked down by an AI-powered resume scanner, what are the consequences? If an Australian First Nations elder’s oral history is transcribed by voice recognition software and the system fails to capture culturally significant terms, what knowledge is lost or misrepresented? These questions are unfolding in real time as governments, educational institutions and corporations adopt AI technologies at scale.

The idea that there is one “good” or “correct” English is a myth. English is spoken in diverse forms across regions, shaped by local societies, cultures, histories and identities. As Noongar writer and educator Glenys Collard and I have written, Aboriginal English has “its own structure, rules and the same potential as any other linguistic variety” and the same is true of other forms of English. Indian English, for example, has lexical innovations such as “prepone” (the opposite of postpone). Singapore English (Singlish) integrates particles and syntactic features from Malay, Hokkien and Tamil. These are not “broken” forms of English. Each community where English was imposed has gone on to make English its own.

English, and language more generally, is never static. It adapts to meet the needs of an ever-changing society and its speakers. Yet in AI development, this linguistic diversity is often treated as noise rather than signal. Non-standardised varieties are underrepresented in training datasets, excluded from annotation schemes, and rarely feature in evaluation benchmarks. This results in an AI ecosystem that is multilingual in theory, but monolingual in practice.

So, what would it look like to build AI systems that recognise and respect a range of different forms of English? A shift in mindset is required, from prescribing “correct” language to including many varieties of language. What we need are systems that accommodate linguistic variation. This may involve supporting community-led efforts to document and digitise linguistic varieties on their own terms, bearing in mind not all linguistic varieties should be digitised or documented. Collaboration across disciplines is also important. It requires linguists, technologists, educators and community leaders working together to ensure AI development is grounded in principles of linguistic justice.

The goal is not to “fix” language but to create technology that produces just outcomes. The focus should be on changing the technology, not the speaker. English has been a powerful vehicle of empire, but it has also been a tool of resistance, creativity and solidarity. Around the world, speakers have taken the language and made it their own. AI-enabled systems should be built to be as inclusive of this variability as possible.

Next time your phone tells you to “correct” your spelling, or an AI chatbot misunderstands your phrasing, ask yourself: whose English is it trying to model? And whose English is being left out?

In a related development, commercial drivers operating in the U.S. must meet certain standards in English proficiency or be placed out of service (OOS), according to the Commercial Vehicle Safety Alliance (CVSA). Enforcement of Title 49 Code of Federal Regulations (CFR) 391.11(b)(2) of the CVSA North American Standard Out-of-Service Criteria will begin June 25, 2025.

The announcement by CVSA comes on the heels of an executive order requiring commercial drivers operating in the U.S. to be proficient in English, which was signed by President Donald Trump on April 28. According to a statement from CVSA, an “English Proficiency (U.S. Only)” heading will be added to the “Part I – Driver” section of the North American Standard Out-of-Service Criteria stating: “Driver cannot read and speak the English language sufficiently to communicate with the safety official to respond to official inquiries and directions in accordance with FMCSA enforcement guidance. (391.11(b)(2)) Declare driver out of service.”

CFR 391.11(b)(2), “General qualifications of drivers,” states that a driver must be able to read and speak the English language sufficiently to converse with the general public, to understand highway traffic signs and signals in the English language, to respond to official inquiries and to make entries on reports and records.

“Federal law is clear; a driver who cannot sufficiently read or speak English — our national language — and understand road signs is unqualified to drive a commercial motor vehicle in America. This commonsense standard should have never been abandoned,” said U.S. Transportation Secretary Sean P. Duffy in an April 28 statement. “This department will always put America’s truck drivers first.”

In addition, CVSA says it plans to petition the Federal Motor Carrier Safety Administration (FMCSA) to update 49 CFR 391.11(b)(2) to identify non-compliance with English language proficiency as an OOS condition. CVSA will also send a petition FMCSA requesting that the agency harmonize the commercial driver’s license English language requirements in 49 CFR Part 383 “Commercial Driver’s License Standards” with those in 49 CFR Part 391 “Qualifications of Drivers and Longer Combination Vehicle (LCV) Driver Instructors” so that the standards are consistent.

Linda Garner-Bunch has been in publishing for more than 30 years. You name it, Linda has written about it. She has served as an editor for a group of national do-it-yourself publications and has coordinated the real estate section of Arkansas’ only statewide newspaper, in addition to working on a variety of niche publications ranging from bridal magazines to high-school sports previews and everything in between. She is also an experienced photographer and copy editor who enjoys telling the stories of the “Knights of the Highway,” as she calls our nation’s truck drivers.