Today : Jul 08, 2025
Technology
21 March 2025

OpenAI Unveils New Models And Advances In AI Technology

The latest innovations explore text-to-speech capabilities and the potential of small language models.

OpenAI has taken a significant step forward in artificial intelligence by unveiling a new text-to-speech model that allows for personalized voice selections, marking an exciting development in the field. On March 21, 2025, the company published audio examples showcasing various tones such as a knight reciting a ballad and a surfer saying, "Wow Dude," highlighting the versatility of their technology. The new text-to-speech and speech-to-text models are now available through the OpenAI API and have been considerably improved as claimed by the company.

According to OpenAI's blog, "the speech-to-text model is even better than Whisper, our previous transcription tool." This advancement is attributed to the integration of reinforcement learning, which enhances the model's understanding of spoken language and reduces error rates. OpenAI has reportedly designed the GPT-4o and GPT-4o mini models to be more cost-efficient compared to older versions through improved model distillation, effectively making these powerful tools available at a lower operational cost.

Additionally, OpenAI has launched a demo page where developers can experiment with these models on OpenAI.fm. This initiative also includes the Agents SDK, which enables developers to transform text-based agents into voice agents, enabling a broader application of AI voice technology.

In a related domain, Christian Winkler, a cover author for the new iX magazine, shared insights on the growing importance of small language models (SLMs). In an interview also published on March 21, he expressed that smaller models do not automatically mean lower performance. He stated, "Everyone is talking about large language models, but less is said about their smaller variants." According to Winkler, SLMs with four billion parameters or less can excel in summarizing text or function as generative components of retrieval-augmented generation (RAG) models.

Winkler delineated the potential for SLMs to operate effectively on consumer laptops. He noted, "As soon as you want to work without a GPU, you should quantize the models." This process allows laptops to generate text at remarkable speeds, sufficient for most applications, solidifying the practicality of deploying SLMs in various environments.

However, not all is ideal with SLMs. Winkler highlighted a notable disadvantage: the tendency of these models to struggle with hallucinations—instances where the model generates incorrect or misleading information due to a limited knowledge base. He explained, "Small models are at a disadvantage when it comes to so-called hallucinations," accentuating the need for careful prompting and validation of outputs.

Both OpenAI’s new features and Winkler’s comments underscore a significant evolution in AI capabilities. As larger models continue to dominate discussions around capabilities, these developments remind users that smaller, specialized models can be just as effective in certain contexts.

Readers eager to delve deeper into the possibilities and limitations of small language models, as well as the advancements in AI tools for local operations, can find a comprehensive overview in the latest issue of iX, available now in stores or online. The April issue not only explores these themes but also engages with opportunities and challenges present in the current IT landscape. Those interested in contributing suggestions on relevant topics are encouraged to reach out or comment in the magazine's forum.

This unique intersection of small models and advanced transcription technology could revolutionize various applications ranging from customer service to content generation, fostering an era where AI can seamlessly integrate into daily operations.