Study Uncovers Serious Flaws In AI News Search Engines

A new study conducted by the Tau Center for Quantitative Journalism at Columbia Journalism Review has revealed serious flaws with generative artificial intelligence (AI) models used for news search operations. The research tested eight different AI-powered search tools equipped with direct search capabilities and uncovered alarming inconsistencies: these models correctly answered questions about news sources more than 60% of the time.

Researchers Claudia Jatzowinska and Esfariar Chandrasekar reported their findings, indicating the reliability of these tools is under question, especially as nearly one out of four Americans now rely on AI models as alternatives to traditional search engines. Given the significant error rates revealed by the study, concerns about misinformation are more pressing than ever.

The study highlighted discrepancies across the tested platforms, with Perplexity providing incorrect information 37% of the time during research tasks. ChatGPT’s AI search returned accurate sources just 33% of the time, making 134 mistakes out of 200 inquiries. The tool Grok fared the worst, showing alarming error rates all the way up to 94%. This considerable failure rate raises substantial concerns considering the increasing reliance on these automated tools for sourcing news.

During the assessments, researchers supplied the AI models with actual news article snippets, prompting them to identify specific article titles, original publishers, publication dates, and links. Overall, the team executed 1,600 inquiries across these eight generative search tools. One significant trend emerged: instead of declining to respond when unable to verify reliable information, these models frequently fabricated answers—responses were often misleadingly plausible yet incorrect. This behavior was consistent among all tested models and was not isolated to just one tool.

Interestingly, the premium paid versions of these AI search tools performed worse on multiple fronts. Perplexity Pro (at $20 per month) and Grok's premium service (costing $40 per month) displayed higher confidence levels when delivering incorrect answers compared to their free counterparts. While these models yielded corrections on more queries, their hesitance to reject uncertain responses contributed to their overall error rates.

Another pressing issue highlighted by the research involves citation practices. The CJR researchers found evidence indicating certain AI tools ignored the robots.txt protocol settings, which publishers utilize to prevent unauthorized access. For example, the free version of Perplexity identified all ten snippets of content from National Geographic, which explicitly prohibits Perplexity's crawling programs. Even when AI search tools did cite sources, they often directed users to aggregated content on platforms like Yahoo News rather than the original publisher's site. This misdirection occurred even where publishers have formed licensing agreements with AI companies.

Link fabrication emerged as another major problem. More than half of the citations from Google Gemini and Grok led users to flawed or disabled URLs, which resulted in error pages. For every 200 citations tested from Grok, as many as 154 led to broken links. Such breaches are causing significant anxiety among publishers, presenting them with tough decisions; barring AI crawling tools could lead to total attribution loss, yet allowing them risks extensive content reuse without drawing traffic back to their own platforms.

Mark Howard, the Chief Operating Officer of Time magazine, voiced his concerns during discussions with CJR about ensuring the transparency and control over how the magazine's content appears through AI search tools. Despite these significant challenges, Howard sees room for improvement with future releases, remarking: “Today is the worst this product could ever be.” He noted substantial investments and engineering efforts directed toward enhancing these tools.

He also critiqued some consumers, stating responsibility rests with users who do not exhibit skepticism toward the accuracy of free AI tools. “If any consumer thinks these free products will be accurate 100% of the time, shame on them,” he emphasized. Both OpenAI and Microsoft contributed data to CJR acknowledging the findings, yet they neither directly addressed the specific problems outlined. OpenAI expressed its commitment to aid publishers by driving increased traffic through summaries, excerpts, and clear attribution. Meanwhile, Microsoft reaffirmed its adherence to robots.txt protocols and publisher directives.

Study Uncovers Serious Flaws In AI News Search Engines

Reliability concerns rise as nearly 25% of Americans rely on AI for news research.