Today : Apr 19, 2025
Science
16 April 2025

Google Unveils DolphinGemma AI Model To Decode Dolphin Communication

The new AI model aims to bridge the gap between human and dolphin interactions through advanced machine learning technology.

Can an AI Chatbot Help Us Talk to Dolphins? Google’s DolphinGemma Could Be the Key

Google’s groundbreaking AI model aims to bridge the communication gap between humans and dolphins, leveraging advanced machine learning technology.

What are dolphins saying to each other? What are they trying to tell us? Recent advancements in machine learning and large language models (LLMs) could be moving us closer to achieving the long-elusive goal of interspecies communication. Google announced on Monday, April 14, 2025, that its foundational AI model called DolphinGemma will be made accessible to other researchers in the coming months.

The tech giant claimed that this ‘open’ AI model has been trained to generate “novel dolphin-like sound sequences,” and will one day facilitate interactive communication between humans and dolphins. “By identifying recurring sound patterns, clusters and reliable sequences, the model can help researchers uncover hidden structures and potential meanings within the dolphins’ natural communication—a task previously requiring immense human effort,” Google said in a blog post.

Developed in collaboration with AI researchers at Georgia Institute of Technology in Atlanta, DolphinGemma is a lightweight, small language model with a parameter count of 400 million, making it optimal to run on Pixel phones for underwater use by researchers. Its underlying technology comprises Google’s SoundStream tokenizer, which converts dolphin sounds into a string of discrete, manageable units called tokens.

The model architecture borrows from Google’s Gemma series of lightweight, ‘open’ AI models. DolphinGemma is an audio-in and audio-out model, meaning that it processes sound rather than text, and thus cannot respond to written prompts. Similar to how LLMs predict the next word or token in a sentence in human language, DolphinGemma analyses “sequences of natural dolphin sounds to identify patterns, structure and ultimately predict the likely subsequent sounds in a sequence,” Google explained.

Google revealed that it plans to release DolphinGemma as an open model so that other researchers can fine-tune it based on the sounds of various cetacean species, such as bottlenose and spinner dolphins. The AI model was trained on datasets collected from field researchers working with the Wild Dolphin Project (WDP), a non-profit organization dedicated to dolphin research.

The training dataset comprises unique dolphin sounds such as signature whistles (used by mothers to call their calves), burst-pulse squawks (usually heard when two dolphins are fighting), and click buzzes (often heard during courtships or chasing sharks). This specific community of dolphins is found in the Bahamas, an island country in the Caribbean.

In order to establish a shared vocabulary of dolphin sounds, Google collaborated with Georgia Tech researchers to develop the CHAT system, which stands for Cetacean Hearing Augmentation Telemetry. This underwater machine links AI-generated dolphin sounds with specific objects that dolphins enjoy, like seagrass.

Google stated that the CHAT tool enables a two-way interaction between humans and dolphins by accurately hearing the dolphin sound whistle underwater, identifying the matching sequence of a sound whistle in its training dataset, and informing the human researcher (via underwater headphones) about the corresponding object that the dolphin had requested. This would enable the researcher to respond quickly by offering the correct object to the dolphin, reinforcing the connection between them.

“By demonstrating the system between humans, researchers hope the naturally curious dolphins will learn to mimic the whistles to request these items. Eventually, as more of the dolphins’ natural sounds are understood, they can also be added to the system,” the company added.

Google also mentioned that its Pixel 6 series had shown it was capable of processing dolphin sounds in real-time. The next generation of the CHAT system is set to integrate specific speaker and microphone functions with Pixel 9 smartphones, enhancing the processing capabilities to run both deep learning models and template matching algorithms simultaneously.

Researchers have been studying ways to leverage AI and machine learning algorithms to make sense of animal sounds for several years now. They have had success applying automatic detection algorithms based on convolutional neural networks to pick out animal sounds and categorize them based on their acoustic characteristics. Deep neural networks have made it possible to find hidden structures in sequences of animal vocalizations, ensuring that AI models trained on examples of animal sounds can generate a unique, synthetic version of that animal sound.

These types of models are known as supervised learning models, which can generate made-up animal sounds based on numerous examples labeled and annotated by humans. However, what about animal sounds fed to the AI model that haven’t been labeled? This is where self-supervised or unsupervised learning models come in. They are trained on vast amounts of unlabelled data pulled from books, websites, social media feeds, and anything else on the internet.

Unsupervised learning models can sort data and predict patterns all on their own, making them more advantageous for translating animal sounds into human language, as they are not limited by existing knowledge about animal communication. Researchers also expect that the vast training datasets may contain animal sounds that have been previously inaccessible. Google’s DolphinGemma appears to fall into the category of semi-supervised learning models, which are trained on both labeled and unlabelled examples.

Nonetheless, there are major challenges in developing an AI chatbot that allows humans to talk to animals. Researchers have pointed out that animals likely communicate using more than just sound, incorporating other senses such as touch and smell. Validating AI-generated animal sounds could also be challenging since there is still much that humans don’t understand about animal communication. Developing AI models for this purpose will require significantly more data.

As the research progresses, the potential for improved understanding between species may not only enhance our knowledge of dolphin communication but could also open new avenues for conservation efforts and animal welfare.