Generative AI is a Black Mirror for Educators
ChatGPT shows us why lecture classes are a bad way to learn
I like to think I am a good teacher. For several years I taught a large, popular course at the Harvard Kennedy School called “The Causes and Consequences of Inequality”. Standing on the stage of the largest lecture hall at HKS, I dissected the best and most current research on inequality for my 120 students in a fun and engaging style. The course received excellent reviews every year, and I’ve kept in touch with many of my former students. Although I left the classroom in 2020 for a three-year stint as Academic Dean, I’ve been eager to return ever since.
Yet lately I’ve been consumed with a dark thought. Did my students really learn, or were they merely entertained? Were they able to apply the knowledge they gained in my course toward useful ends?
Are we teaching our students to be stochastic parrots?
I had these dark thoughts while I was reviewing my old course assignments, which in retrospect could have been completed easily by ChatGPT or another generative AI program. I assigned essays and take-home exams that were meant to elicit deep, original synthesis of course concepts. The goal of the course was to help students understand how broad societal forces that drive rising inequality relate to specific policy choices at the federal, state, and local levels.
Here is a short version of an actual take home exam prompt:
“If we increased the top marginal income tax rate to 1979 levels (70 percent), we would reduce economic inequality back to 1979 levels.”
Do you agree with this statement? Please describe what problems would (and would not) be solved by a large increase in the top marginal income tax rate. Make a specific prediction about what would happen if we increased the top marginal income tax rate in the US to 70 percent. What new problems – if any – would this change create? Is there a better way to tax top earners if the goal is reducing inequality, and if so, provide a specific explanation.
I was proud of this question. It’s important and policy-relevant, and you can’t just look it up on the internet. I asked students to make evidence-based inferences from readings and discussions across multiple class topics. Those who did it more persuasively got better grades.
Exam questions like mine will not survive the age of generative AI. ChatGPT can deliver an A level answer with minimal prompting – if not now, then soon, as the technology continues to improve. It does so with a complex sentence completion algorithm that creates nuanced answers to original questions by drawing on the collective wisdom of the internet. No one is sure whether generative AI programs are truly intelligent or whether they are just highly sophisticated mimics, or “stochastic parrots”. Generative AI raises real questions about the nature of mastery. If ChatGPT can write a superb answer to the question above, does that mean that it truly understands how tax policy affects the income distribution in an economy? How would we know for sure?
I have the same uncertainty about my own students. Assignments like the prompt above ask respondents to regurgitate combinatorial variations of the concepts taught in class. Is that sufficient to demonstrate mastery of course concepts, or is it just mimicry? Was I teaching my students to be stochastic parrots?
As teachers we are too often content to assess learning through demonstration of knowledge. In the age of generative AI, that’s not good enough. Maybe it never was. Instead, we must insist that students demonstrate mastery by applying the knowledge they’ve gained. They should demonstrate learning-by-doing through carefully curated performances that are aligned with course goals.
Technology-enabled cheating – new broom, same old dirt
The low-tech countermeasures to fight AI-augmented cheating – in-person blue book tests, oral exams – work because they are performances. Students transmit their knowledge straight from pen to paper to the eyes of their instructor, with no filter in between. The take-home exam, on the other hand, allows them to benefit from outside assistance.
Cheating was always possible on take-home exams and papers. Generative AI just makes it easier and cheaper. There’s even a pre-ChatGPT term for cheating called “Chegging”, named after the education technology company Chegg which provides online homework help, tutoring, and other services. We know that generative AI supercharges cheating, because Chegg stock cratered after they reported on an earnings call that ChatGPT was eating into their new customer growth rate.
If cheating is getting easier on take-home assignments, then why do college professors and pedagogy experts prefer them to in-class exams anyway? Because giving students more time for reflection allows us to ask the kinds of sophisticated, multi-stage questions that facilitate deeper learning. Timed in-class exams are a very poor approximation of what our students are asked to do in the real world, where they are asked by employers to apply their learning to new problems, often through long-term projects and in team-based settings.
How do we facilitate deeper learning when generative AI tools provide our students with such tempting shortcuts?
We should make assessment even more complex and interactive than a take-home exam. Designing course assessments around structured feedback and direct practice of key skills will help students learn more, while also being more AI-resistant.
For example - one goal of my course was to help students learn to weigh evidence from competing perspectives and form nuanced judgments on contentious issues. If they were sure that the Reagan tax cuts in the 1980s increased economic inequality in the U.S., I would ask them why inequality also increased at the same time in continental Europe where tax cuts were smaller or nonexistent, or why the biggest increases were in pre-tax rather than post-tax inequality. The point was not to figure out the right answer, but to develop the skill of weighing evidence and managing uncertainty.
Next time I teach this course, I might ask students to make short presentations to their peers on both sides of a controversial issue and grade them on their ability to be convincing from both directions. Or I might ask them to work in groups on a semester-long project that seeks to persuade a policymaker of the importance of a particular issue through careful use of research evidence.
In other words, I can AI-proof my classroom by assessing students based on what they can do, not just on what they seem to know.
Learning is a contact sport
Generative AI is forcing me to make these changes now, but I probably should have done it already. One paper pooled evidence from 225 different studies of “active” learning practices in Science, Technology, Engineering, and Mathematics (STEM) subjects. Active learning approaches include group problem-solving, peer feedback, and studio or workshop-based assignments. Students in active learning courses were about 35% less likely to fail the course and scored about half a letter grade higher (e.g. B- rather than C+….or A+ instead of A if you are a Harvard student) compared to students in traditional lecture-based classes.
Interestingly, a recent study found that students reported feeling more confused when engaged in active learning, even though they learned more and did better on exams! The authors speculate that this is because active learning requires greater cognitive effort, which feels like confusion.
Another reason that active learning might increase performance while also creating confusion is that students receive more and better feedback. When you passively listen to a lecture and submit an essay or take-home exam based on it, your only source of feedback is the instructor’s comments.
You get way more feedback by presenting your work to an audience or engaging with peers on longer-term projects. Also, the feedback is instant rather than coming several days after you’ve turned in your assignment.
Feedback aids learning by pointing out gaps in understanding. Ideas often seem great inside one’s own mind, but they need to be shaped, refined and stress-tested by the outside world. Research suggests that regular evaluation and critical feedback improves performance in areas ranging from classroom teaching to call centers to textile production.
In addition to being pedagogically superior, active learning is naturally resistant to generative AI-augmented cheating. ChatGPT still has no connection to the physical world. It can write your essays for you, but it can’t (yet) help you through a presentation by uploading knowledge to your head like Neo in the Matrix.
Until that day comes, we should design our courses to help students apply their learning to complex, unpredictable, interactive settings, where humans still have an advantage over machines.
Generative AI has made my old way of teaching obsolete. Still, I plan to race – rather than rage – against the machine. I hope that the changes I make in my classroom will ultimately make me a better teacher and will help my students adapt to the rapidly changing world of AI-enabled work.
Finally, there is the interesting question of how to best use generative AI tools to facilitate better teaching. That is maybe an essay for another day, but instead of listening to me you should probably just read Ethan Mollick’s wonderful substack, “One Useful Thing”.
Prof. Deming - I was fortunate enough to be one of those students in "The Causes and Consequences of Inequality" in 2018. I can tell you definitively that I carry lessons from that course with me to this day, in my work on the city and state government levels, in the philanthropic world and in academia. Now that I'm on the other side of the ledger (a Visiting Professor teaching Quantitative Methods), I think a lot about the crisis of ChatGPT and how it actually indicates something wrong with academia, particularly at the undergrad level. Having students produce 20 page papers doesn't do much that will be replicable in any career field....except academia. I appreciated the marriage of theory and practice at HKS, but generally find it to be lacking by Professors who don't leave academia.
Presentations in general, and (dare I say) debates in particular, feel much more grounded in the work these folks will be doing.
"I might ask students to make short presentations to their peers on both sides of a controversial issue and grade them on their ability to be convincing from both directions."
Unless they have to come up with arguments on the spot, how is this AI-proof?