Today : Aug 24, 2025
Technology
23 August 2025

Stanford And UT Dallas Pioneer Privacy In Speech AI

Two research teams unveil breakthrough technologies that decode speech and inner thoughts while embedding safeguards to protect user privacy.

In an era where artificial intelligence is increasingly woven into the fabric of daily life, two groundbreaking projects—one at Stanford University and another at the University of Texas at Dallas—are charting new territory in speech technology. Both initiatives, announced on August 23, 2025, have set their sights not just on improving the accuracy and reach of automatic speech recognition (ASR) and brain-computer interface (BCI) systems, but on embedding robust privacy protections into their very cores. The result? A glimpse of a future where machines can decode our voices—or even our silent thoughts—without compromising our most personal boundaries.

At Stanford, researchers led by neuroscientists Benyamin Meschede Abramovich Krasa and Erin M. Kunz have developed a BCI that can decode so-called "inner speech." This refers to the silent internal monologue most people experience while reading or thinking. Traditionally, BCIs designed to synthesize speech have relied on signals from parts of the brain that control the muscles used to speak. These systems require users to physically attempt speech—a task that can be exhausting or impossible for individuals with severe paralysis, such as those living with ALS or tetraplegia.

According to Stanford’s research team, the breakthrough came from shifting focus away from attempted speech, which is linked to muscle movement, and instead targeting the neural signals associated with inner, or silent, speech. As Krasa explained, “Attempted movements produced very strong signal, and we thought it could also be used for speech.” But for their latest project, the team decided to tackle the challenge of decoding higher-level language processing, venturing into the mysterious territory of our internal monologues.

To train the AI algorithms underpinning this BCI, the Stanford team worked with four participants who were almost completely paralyzed. Each had micro electrode arrays implanted in slightly different areas of their motor cortex. The participants performed a variety of tasks, such as listening to recorded words and engaging in silent reading, allowing the researchers to collect crucial data on the neural signals involved in inner speech.

But with this new capacity to decode silent thoughts came a host of ethical challenges. After all, inner speech is the most private form of communication—sometimes encompassing thoughts we would never want to share aloud. To address this, the Stanford researchers designed a first-of-its-kind “mental privacy” safeguard. This system acts as a gatekeeper, ensuring the BCI only decodes speech that the user intends to share. The safeguard’s design marks a significant step forward in the ethical development of neural prostheses, especially as the technology moves closer to clinical use.

Meanwhile, at the Texas Advanced Computing Center, a different team of scientists is tackling privacy from another angle. Led by Satwik Dutta, a Ph.D. student at the Erik Jonsson School of Engineering and Computer Science at UT Dallas, and his advisor John H.L. Hansen, the group has developed a privacy-focused ASR system specifically for children. Their work, conducted with the computational might of the Lonestar6 supercomputer, aims to address a longstanding challenge: most existing ASR systems are trained on adult speech and perform poorly when transcribing the voices of young children.

Children’s speech is a moving target—it’s shaped by developing vocal skills, evolving grammar, and the unique quirks of early language acquisition. “Over the years, developing such automatic speech recognition system has been very challenging, especially for children,” Dutta noted in a recent interview. The team’s solution was to use “discrete speech units”—mathematical abstractions that encode audio in a way that anonymizes it. This approach not only protects privacy by making it nearly impossible to reconstruct the original speech waveform, but also reduces the computational demands of the model.

“As soon as the speech is loaded you can convert it into discrete speech units, then you don’t have any concerns of violating privacy because the speech is gone. You can no longer generate it,” Dutta explained. In practical terms, this means that children’s voices—recorded in noisy childcare settings using small, wearable LENA devices—can be analyzed for research and intervention without risking the exposure of sensitive personal data.

The technical achievements here are nothing to sneeze at. The ASR model developed at UT Dallas contained just 40 million parameters, a fraction of the size of state-of-the-art models with over 428 million parameters. Yet, thanks to the computational resources provided by Lonestar6 and the Corral data storage system, the team was able to achieve performance on par with much larger systems. “Voice based data is computationally expensive, and I needed to compare my results with modern state-of-the-art systems. Without TACC that would not have been possible,” Dutta said, referencing the Texas Advanced Computing Center’s support.

The project, funded by the National Science Foundation, is a collaborative effort involving multiple institutions, including the University of Florida and the University of Kansas. It began during the height of the COVID-19 pandemic, when researchers were limited to analyzing existing datasets of children recorded during virtual tutorials. Once restrictions eased, the team expanded their research to real-world environments, capturing the authentic, messy realities of preschool speech.

But the innovations didn’t stop there. The UT Dallas team’s more recent work, presented at the 7th ISCA Workshop on Child Computer Interaction (WOCCI 2025), explores running ASR models on affordable, edge devices like the Raspberry Pi 5. The idea is to transcribe speech locally and then immediately discard the raw voice data, further reinforcing privacy protections. “Using supercomputers to study speech is new, innovative, and can accelerate the research of using speech AI for so many applications—education, clinical, educational, forensic—anywhere you can find speech,” Dutta remarked. He added, “I think as a scientist, if you’re working on applications for children, the first thing that you should think about is how does it preserve children’s privacy. Whatever we do, it should be trustworthy and ethical. I envision a safe digital future for all children.”

Taken together, these two projects point toward a new paradigm in speech technology—one where privacy is not an afterthought, but a foundational principle. Whether it’s a BCI that listens only when you want it to, or an ASR system that erases your voice as soon as it’s transcribed, the message is clear: as machines get better at understanding us, we must get better at protecting ourselves. And with scientists on both coasts leading the way, the future of speech technology looks not just smarter, but safer.