Study Shows that Students Using ChatGPT Do Worse on Tests

Students Using Chatgpt To Take Tests Featured

This certainly isn’t something we were expecting. We’ve been told that artificial intelligence can do anything and solve all our problems. Yet, a new study has shown that students using ChatGPT to prepare for tests don’t do as well as those without access to the AI chatbot.

University of Pennsylvania ChatGPT Research

Many people break out in a sweat at the thought of taking a math test. Personally, I love math, but I know I’m not the norm. For those who don’t, it would probably be a relief to know that they would have the use of ChatGPT to prepare for their test.

A group of Turkish high school students were given access to ChatGPT to do practice math problems before taking a math test. Another group of students didn’t have the ChatGPT practice beforehand.

Students Using Chatgpt To Take Tests Classroom
Image source: Unsplash

Researchers at the University of Pennsylvania studied the results. Sure, ChatGPT helped that test group solve 48 percent more of the practice problems correctly, but when they took the math test, they scored 17 percent worse than the students without the AI help.

There was actually a third test group of Turkish students. While they also had access to ChatGPT, it was a revised version of the chatbot, with functionality that resembled a tutor. Instead of just providing the students with the answers, it provided hints to help them get to the correct answer on their own. They solved 127 percent more of the math problems correctly than those without the help. Yet, when it came time to take the test, the scores between these two groups were similar.

Tip: if you’re looking into using AI writing tools, learn some of the reasons you shouldn’t.

Researchers’ Conclusion on ChatGPT in the Classroom

The university researchers came to the conclusion that ChatGPT and other AI chatbots can “substantially inhibit learning,” as even when it was designed to act as a tutor, it still didn’t give students a leg up. But not only did they say it didn’t help, they titled the research paper, “Generative AI Can Harm Learning.”

It’s the belief of the researchers, after analyzing the questions the students asked of ChatGPT, that the students are using the AI tool as a “crutch.” They didn’t seem to be trying to figure it out on their own – they just turned around and asked the practice question to ChatGPT to get the answer. It didn’t guide them or teach them anything. Researchers compared it to being on autopilot.

Students Using Chatgpt Girl Taking Math Test
Image source: Unsplash

Added to that, the students put too much faith in ChatGPT. Those who used it believed that it didn’t cause then to learn less, though it clearly did. Those who used the tutor version of ChatGPT believed they had done much better than they did.

Additionally, ChatGPT wasn’t always correct, only getting the right answers about half the time. The computations were wrong about 8 percent of the time, and the way it went about solving the math problems was wrong about 42 percent of the time. Those errors didn’t show up in the tutor version, but that’s because the chatbot was programmed with the correct answers.

It’s notable that this was only in one location with one group of students, yet it did involve nearly 1000 students, freshman to junior year of high school last fall. All students were assigned the same practice problems, then took the test. They went through four cycles of this.

Older generations will see this as a “dumbing down” of our youth. I already noticed when my own kids were in school that they didn’t have to learn handwriting because of the influence of computers, and they also didn’t learn long division. Perhaps AI chatbots will add to this even more. AI tools can also be very beneficial. Check out these AI tools for students and AI tools that can help students with disabilities.

Image credit: Unsplash

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Laura Tucker Avatar

Read next

When the SS Great Eastern laid the first working transatlantic telegraph cable in 1866, a message that had taken ten days by steamship suddenly crossed the ocean in minutes, and the financial markets of London and New York were forced, within a single trading week, to invent the modern concept of synchronised global price.
The Big Ear telescope was scanning at 1420.4056 megahertz on the night of 15 August 1977, the exact frequency at which hydrogen atoms vibrate across the universe, because Giuseppe Cocconi and Philip Morrison had argued years earlier that any species trying to be found would broadcast on that channel — and then, for 72 seconds, something did.
In 2016, archaeologists dated two rings of snapped stalagmites in France’s Bruniquel Cave to 176,500 years ago, evidence that Neanderthals had walked 336 metres into darkness with fire and built architecture deep underground long before modern humans reached Europe
Otto von Bismarck was 74 when Germany adopted the world’s first national old-age social insurance program in 1889, setting the pension age at 70 after years of fighting socialists with bans, laws, and a promise few workers would live long enough to use
When cosmonaut Valeri Polyakov stepped out of his Soyuz capsule in March 1995 after 437 consecutive days aboard Mir, doctors recorded him at several centimetres above his pre-flight height, and his spine had become so unaccustomed to gravity that the recovery team carried him to a chair rather than risk the compression of letting him walk.
When Bell Labs engineer Karl Jansky pointed a rotating antenna at the sky in 1932 looking for sources of transatlantic radio static, he kept picking up a faint hiss that peaked every 23 hours and 56 minutes, and he eventually realized he had become the first human to hear the center of the Milky Way.
When Harvard astronomer Cecilia Payne submitted her 1925 doctoral thesis arguing that the Sun was made almost entirely of hydrogen, the field’s senior figure Henry Norris Russell talked her into adding a line calling the result ‘almost certainly not real,’ and then published the same conclusion himself four years later to widespread acclaim.
When seismic waves from the Chicxulub impact reached what is now North Dakota roughly ten minutes after the asteroid struck, they appear to have triggered a ten-metre standing wave in an inland river that flung fish onto the bank and buried them under glass beads still falling from the sky.