Today : Jun 21, 2025
Science
21 March 2025

Generative AI Struggles To Achieve True Scientific Discovery

Despite advances, ChatGPT4 can only make incremental discoveries, lacking human creativity.

As artificial intelligence continues to develop at a rapid pace, scientists are increasingly curious about its potential to make breakthroughs in scientific discovery. A recent study published on March 20, 2025, has examined whether generative artificial intelligence (GenAI) can achieve scientific discoveries comparable to those made by human researchers. The findings reveal a stark distinction between the capabilities of GenAI and human ingenuity.

The research focused on ChatGPT4, a widely recognized form of GenAI, which was tasked with investigating a specific scientific law within the molecular genetics field. Drawing inspiration from the historical discovery of genetic control mechanisms made by Nobel laureates Jacques Monod and Francois Jacob in 1965, the study required ChatGPT4 to identify the roles of three regulatory genes related to β-gal production in E. coli. Through this task, researchers sought to observe whether ChatGPT4 could formulate original hypotheses and design experiments independently.

While the results showed that ChatGPT4 could indeed generate hypotheses and design a total of 12 experiments, it fell short when compared to typical human performance. The average human participant conducted approximately 13.89 experiments, proposing 14 hypotheses, significantly more than ChatGPT4. This discrepancy highlights a fundamental limitation of GenAI: it can only assist in scientific tasks involving established domain knowledge but lacks the ability to create original ideas or respond to unexpected findings.

In the course of its experiments, ChatGPT4 was expected to uncover that the I gene is a chemical inhibitor and the O gene a physical inhibitor of β-gal production, while the P gene plays no role. However, the study found that while ChatGPT4 displayed high confidence in its conclusions, it failed to identify the P gene's irrelevance alongside the deeper dynamics of gene regulation that humans were able to grasp. This led to the conclusion that current generative AI systems can only achieve incremental advancements rather than fundamental breakthroughs typical of human cognition.

One of the major findings was that ChatGPT4 is incapable of generating truly original hypotheses. Instead, it relies on a pre-trained knowledge base, constrained by the parameters set by earlier human discoveries, which limits its ability to exhibit the creativity or curiosity necessary for scientific discovery. The authors of the article noted that "current GenAI can make only incremental discoveries but cannot achieve fundamental discoveries from scratch as humans can.”

This limitation becomes particularly apparent when the study examined the context of anomaly detection during experimental outcomes. Unlike human participants who were often inspired by unexpected results to formulate new hypotheses, ChatGPT4 maintained a rigid approach, lacking the ability to experience curiosity or epiphany moments that typically catalyze human innovation.

ChatGPT4 exhibited a tendency to propose hypotheses based on its prior training data instead of generating insights from novel observations in the laboratory setting. Consequently, its performance was rated lower than that of human participants, with discovery scores of 1 and 1.67 for ChatGPT4 and humans, respectively.

The exploration into the limitations of GenAI in scientific discovery does not merely highlight deficiencies; it raises critical ethical concerns as well. As scientists increasingly utilize GenAI in research, it is imperative to remain vigilant regarding the potential biases that may arise from these AI systems, given their dependence on data that may inadvertently favor existing knowledge or paradigms. The authors emphasized the need for transparency and oversight when allowing GenAI to generate hypotheses or conclusions that inform significant scientific work.

Moreover, they pointed out the necessity of diversifying training datasets and incorporating mechanisms to deter biases while enhancing GenAI's functions of curiosity and creativity. Addressing these issues could help bridge the gap between current GenAI limitations and the dynamic processes emblematic of human discoveries.

The study concludes that while GenAI like ChatGPT4 presents transformative potential within the domain of scientific discovery, its current capabilities are limited to tasks that involve either a known representation of domain knowledge or access to human scientists' existing knowledge space. As researchers explore ways to enhance GenAI's role, the focus may very well shift toward fostering AI systems that invoke human-like curiosity and imagination, enabling them to contribute to genuinely original scientific discoveries.