The increasing integration of artificial intelligence (AI) tools is reshaping education across various disciplines, particularly mechanical engineering. A study evaluating three prominent generative AI tools—ChatGPT, Gemini, and Copilot—has provided fresh insights on their efficacy and reliability for undergraduate mechanical engineering students. The research encompassed 800 questions across seven subjects, using multiple-choice, numerical, and theory-based formats, to assess how well these AI tools performed.
Interestingly, all three AI models showed competence when answering theory questions but faced significant challenges with numerical problems; issues often arise due to complex calculations requiring deep conceptual grasp. Among the tools evaluated, Copilot emerged as the most accurate, scoring 60.38% on the questions presented, followed closely by Gemini with 57.13%. ChatGPT lagged behind with accuracy at just 46.63%. The study's findings raise important queries about the role of generative AI tools, their reliability, and the potential impact they may have on students' development of problem-solving skills.
To enrich the quantitative data, researchers conducted surveys involving 172 mechanical engineering students and held interviews with 20 participants. The qualitative narratives gathered offered valuable perspectives on user experiences, their perceived challenges with these tools, and insights on how AI tools can be integrated effectively within academic settings.
Despite the promising capabilities observed with theoretical queries—where all AI tools demonstrated good performance with correctness averaging above 80%—numerical questions proved more problematic. For example, Copilot and Gemini were able to solve about one-third of numerical questions correctly, whereas ChatGPT's success rate was merely one-quarter. The stark difference between performance on theory-based questions compared to numerical queries exposes the limitations of these AI systems when faced with calculations requiring multi-layer reasoning.
Survey results showcased how students are increasingly becoming reliant on generative AI tools for academic support, with 78.5% of participants indicating they had utilized AI tools for more than three months. When asked about perceptions of AI tools, 62.8% of respondents expressed positive views about these platforms as educational aids. Yet, nearly three-quarters (76.7%) reported encountering issues like inaccurate or unreliable answers. This highlights the importance of teaching students to critically evaluate the outputs generated by these AI systems to avoid over-dependence on potentially flawed responses.
Participants acknowledged the vast applicability of AI tools, using them primarily for gathering information, summarizing content, and completing assignments. While tools like ChatGPT have become favored by students for their comprehensive explanations, there were significant discussions around the challenges faced when using AI for numerical tasks.
When it came to evaluating the tools based on their performance with numerical questions, the study found alarming trends. For example, various instances demonstrated how the AI tools provided incorrect calculations, leading to confusion among students utilizing these outputs. If AI tools struggled to handle straightforward numerical questions or failed to incorporate appropriate assumptions—suggesting fundamental weaknesses—they could potentially undermine students' learning experience.
Comments from student interviews reflected these concerns. Many students stated they felt they could not fully trust the AI-generated solutions and highlighted the need for clearer guidance on how to effectively engage with these tools. Students also pointed out instances where they had to guide the AI manually to arrive at correct answers, indicating the necessity for enhanced training to minimize errors. This reliance on active prompting reinforces the need for educators to motivate students toward developing independent problem-solving skills rather than passive acceptance of AI outputs.
From the instructors' perspective, the study’s findings suggest strategic opportunities to leverage the limitations of AI tools to promote more engaged learning. Tutors can confidently assign advanced numerical problems, knowing AI tools will not reliably provide correct answers, which prompts students to collaborate and critically approach error-ridden solutions. This creates opportunities to facilitate meaningful classroom discussions, allowing students to practice analytical skills and develop their confidence.
To capitalize on AI integration positively, the study proposes strategic recommendations to adapt assessment methods, ensuring consistent application of AI-supported learning without compromising the students' development of key analytical skills. AI should engage students as supplements to traditional learning rather than replacements, ensuring students develop the fortitude to analyze and critique the solutions provided by these systems.
While the potential for generative AI tools to contribute to the educational experience is evident, the study emphasizes the importance of addressing their shortcomings. Future studies are necessary to refine AI's accuracy, particularly concerning numerical problem-solving tasks. By advocating for structured AI-assisted learning strategies, educators can maximize the strengths of AI tools, empowering students as they navigate the increasingly technology-driven educational environment.
Overall, the study provides insightful reflections on the opportunities and challenges posed by generative AI tools like ChatGPT, Gemini, and Copilot within mechanical engineering education. With careful integration, these tools can potentially enrich learning experiences, fostering innovation, exploration, and continuous improvement within educational methodologies.