ChatGPT Performs At University Level

Aug 29 2023

ChatGPT Performs At University Level

Published by Steven Novella under Technology
Comments: 0

We are still sorting out the strengths and weaknesses of the new crop of artificial intelligence (AI) applications, the poster-child of which is ChatGPT. This is a so-called large language model application using a “generative pre-trained transformer”. Essentially these types of AI are trained on very large sets of data and are able to generate human-sounding text by predicting the most likely next word segment in a sequence. The results are surprisingly good.

There have been a slew of studies seeing how well ChatGPT or a similar AI performs on standardized tests. On any knowledge-based exam, it does very well. ChatGPT has passed the medical board exams, for example, and even many (but not all) subspecialty boards. If the information is out there on the internet, ChatGPT can generate this knowledge.

All of this has teachers in a well-deserved panic. Students can essentially use ChatGPT to write their essays and do their homework. Essays are a bit different than straight-forward knowledge-based exams. They might require analysis and creativity. How would ChatGPT perform at university-level course essay tasks? That was the focus of a recent study. I already gave the answer away in the headline to this blog post – it did very well.

They directly compared the work of students with ChatGPT in 32 different courses, assessed blindly by multiple graders. They found ChatGPT was equal to or superior than the students in 9 of 32 courses. For most of the rest they were with the range of acceptable if outperformed by the students. There were several areas where ChatGPT did not perform well, and these were predictable based on the known weaknesses of the application – mathematics, economics, and coding. ChatGPT is not good at math, so any course work that heavily relied upon math, it faltered. But overall, ChatGPT is performing within the range of university students in terms of completing essay and homework assignments.

There are some interesting tidbits if we delve deeper into the data. The experts who assess the work were asked to evaluate the tasks and responses based on their difficulty for creativity, knowledge, and cognitive processes. ChatGPT, in line with previous studies, does really well on anything knowledge-based. High level cognitive processes are another matter, and here the results are variable. Interestingly, ChatGPT did almost as well as student in terms of creativity. Student generally did better when it came to other cognitive processes (such as deep analysis) but the gap got narrower as the difficulty increased. In other words, students’ performance degraded faster than ChatGPT as the tasks became more cognitively demanding.

Of course, this is a snap shot of a rapidly moving technology. It is interesting to see where ChatGPT is now, but it is probably safe to assume that within a few years we will have newer versions that perform even better. How much is still an open question, and I have read opinions from experts that are all over the spectrum. Some feel that these large language models are reaching the point of diminishing returns, and therefore the technology is plateauing. Others feel we are just seeing the tip of the iceberg. I suspect the answer is somewhere in between, but even if we only see incremental improvements going forward, ChatGPT successors are likely to perform at or higher than university or even professional levels in the future.

What does this mean for education? There are other data points in this study of interest. First, most students (75%) said they plan to use ChatGPT or similar applications in their schoolwork. Meanwhile, most teachers feel that using these apps amounts to cheating and plagiarism. Further, the two most common tools for detecting ChatGPT generated essays failed to detect the AI’s work 32 and 49% of the time. That sounds like a recipe for trouble. I and others have compared the situation to the introduction of calculators to the teaching of math. Is using a calculator cheating? It depends on the assignment.

What is likely to happen and is happening is that students are happily learning how best to use ChatGPT. It is remarkably easy, for example, to edit what ChatGPT generates to make is sound less like an AI and more like you, evading detection. This can cut hours of work down to minutes. There are several reactions to this situation that I have encountered from teachers and professors.

One reaction is – if students want to cheat themselves out of an education, let them. That’s their problem. Their parents are likely spending tens of thousands of dollars, or they are incurring student loans, to get an education. If they want to throw it away by cheating, they lose. There is some truth to this, but the problem is that some students essentially cheating affects the grades of other students who are not cheating. Further, it distorts the relative assessment of performance among students, which can affect later opportunities. Also, some professors are interested in having feedback mechanisms to help them understand how well students are learning from their course. If what they are doing isn’t working, but it looks like it is because enough students are cheating, that hurts the quality of the education. So it’s a problem for everyone, not just the students who are cheating.

Another approach is to ban the use of ChatGPT or similar apps. This requires having some detection method in place, but as we see this is highly problematic. Not only do detection methods miss ChatGPT generated text, it falsely flags genuine work as possibly being AI. There is probably a limited role for this approach, and universities need to get creative in figuring out how to at least make it harder for students to just have ChatGPT do their work. But this approach will always turn into an arms race, and I fear universities will lose.

The final approach, the one I favor, is to think creatively about how to adapt classwork to the reality of ChatGPT. We live in a world with Google, with the internet, and with ChatGPT. There is no denying it and it’s not worth fighting against it. You can even argue that we need to teach students how to leverage this technology effectively. Let them use ChatGPT. But then we need to design ways to assess student knowledge that does not allow for the use of ChatGPT to do their work. You can’t just assign them an essay.

One approach is to do all knowledge assessment live and in class. This may mean “flipping” the class cycle, an old suggestion that has not proven superior to traditional classwork, but may now have its day in response to ChatGPT. Essentially, you assign students the content to study, by whatever means they want, as homework, and then you test their knowledge live in class. Schools have been moving in this direction already – away from lectures toward using class time for discussion and workshops. At my medical school, for example, we have entirely done away with lectures. Students can watch videos, read papers, or listen to podcasts for their content. Then in the classroom we discuss applications and integration of that knowledge. This is a good way to assess understanding. You can still use in-class tests for testing their factual knowledge.

Yes, this requires work on the part of teachers and schools, but that ship has sailed. ChatGPT is a disruptive technology. There is no quick fix to bring the world back to where it was prior to ChatGPT. Universities must adapt. I think, in the end, it will likely be a good thing. Rethinking how best to impart knowledge and understanding, and then assess that understanding, will likely lead to improvements in education overall. It will be work, but it will be worth it in the end. I personally know several professors dealing with this, and based upon their experiences I will also add one more point. Universities cannot just expect professors and teachers to just do the work, on top of everything else they are already doing. The universities, colleges, and schools themselves have to provide the time and resources for educators to adapt to this new world.

No responses yet