AI makes the humanities more important, but also a lot weirder
Historians are finally having their AI debate
Writing recently in The New Yorker, the historian of science D. Graham Burnett described how he has been thinking about AI:
In one department on campus, a recently drafted anti-A.I. policy, read literally, would actually have barred faculty from giving assignments to students that centered on A.I. (It was ultimately revised.) Last year, when some distinguished alums and other worthies conducted an external review of the history department, a top recommendation was that we urgently address the looming A.I. disruptions to our teaching and research. This suggestion got a notably cool reception. But the idea that we can just keep going about our business won’t do, either.
On the contrary, staggering transformations are in full swing. And yet, on campus, we’re in a bizarre interlude: everyone seems intent on pretending that the most significant revolution in the world of thought in the past century isn’t happening. The approach appears to be: “We’ll just tell the kids they can’t use these tools and carry on as before.” This is, simply, madness. And it won’t hold for long. It’s time to talk about what all this means for university life, and for the humanities in particular.
I suspect that a significant chunk of my historian colleagues had a negative reaction to this article. But I wholeheartedly agree with the central point Burnett makes within it — not that generative AI is inherently good, but simply that it is already transformative for the humanities, and that this fact cannot be ignored or dismissed as hype.
Here’s how I’m currently thinking about that transformation.
Generative AI elevates the value of humanistic skills
Ignoring the impact of AI on humanistic work is not just increasingly untenable. It is also foolish, because humanistic knowledge and skills are central to what it is that AI language models actually do.
The language translation, sorting, and classification abilities of AI language models — the LLM as a “calculator for words” — are among the most compelling uses for the current frontier models. We’re only beginning to see these impacts in domains like paleography, data mining, and translation of archaic languages. I discussed some examples here:
… and the state of the art has progressed quite a bit since then. But since this is one aspect of AI and humanities I’ve written about at length, I’ll leave it to the side for now.
Another underrated change of the past few years is that humanistic skills have become surprisingly important to AI research itself.
One recent example: OpenAI’s initial fix for GPT-4o’s bizarre recent turn toward sycophancy was not a new line of code. It was a new piece of English prose. Here’s Simon Willison on the change to the system prompt that OpenAI implemented:
This was not the only issue that caused the problem. But the other factors in play (such as prioritizing user feedback via a “thumbs up” button) were similarly rooted in big-picture humanistic concerns like the impact of language on behavior, cross-cultural differences, and questions of rhetoric, genre, and tone.
This is fascinating to me. When an IBM mainframe system broke down in the 1950s (or a steam engine exploded in the 1850s), the people who had to fix it likely did not spare a moment’s thought to consider any of these topics.
Today, engineers working on AI systems also need to think deeply and critically about the relationship between language and culture and the history and philosophy of technology. When they fail to do so, their systems literally start to break down.
Then there’s the newfound ability of non-technical people in the humanities to write their own code. This is a bigger deal than many in my field seem to recognize. I suspect this will change soon. The emerging generation of historians will simply take it for granted that they can create their own custom research and teaching tools and deploy them at will, more or less for free.
My own efforts so far have mostly been focused on two niche educational games modeled on old school text-based adventures — not exactly something with a huge potential audience. But that’s exactly why I choose to do it. The stakes were low; the interest level for me personally was high; and I had significant expertise in the actual material and format, if not the code.
The progression from my first attempt (last fall) to my second (earlier this spring) has been an amazing learning experience.
Here’s the first game (you can find a free playable version here). It’s a 17th century apothecary simulator that requires students to read and utilize actual early modern medical recipes to heal patients based on real historical figures. You play as Maria de Lima, a semi-fictional female apothecary in 1680s Mexico City with a hidden past:
It was fascinating to make, but it also has significant bugs and usability issues, and it fairly quickly spools out into LLM-generated hallucinations that are unmoored by historical realities. (For instance, in one play-through, I, as Maria, was able to become a ship’s surgeon on a merchant vessel sailing to England, then meet with Isaac Newton in London. The famously quarrelsome and reclusive Newton was, for some reason, delighted to welcome me into his home for tea.)
My second attempt, a game where you play as a young Darwin collecting finches and other specimen on one of the Galápagos Islands in 1835, is more sophisticated and more stable.
The terrain-based movement system, with specific locations based directly on actual landscapes Darwin wrote about in his Voyage of the Beagle, forces the AI to maintain a kind of literal ground truth. It is difficult to leave the island, and the animals and terrain you encounter are pulled directly from the actual writings of Darwin, reducing the tendency to hallucinate.
There is also a more robust logging system which will come in handy when I want to add an assessment layer to the game and turn it into an actual assignment. You can play Young Darwin here.
My idea is that students will read Darwin’s writings first, then demonstrate what they learned via the choices they make in game. To progress, you must embody the real epistemologies and knowledge of a 19th century naturalist.
The crucial thing is that this would be done alongside an in-class essay and in-person discussions of the reading — it would not replace, but augment the human element of teaching.
I’ll write more about all this in a future post, but the upshot is that this iterative process has been among the more intellectually challenging and enriching experiences of the last few years for me. Anyone who thinks you can’t learn from interactive tutoring by an AI has not tried. You absolutely can.
Generative AI makes it harder to teach humanistic skills
On the other hand, it is just a brutal fact that AI chatbots are significantly damaging core aspects of the educational system. There’s no denying it, and it needs to be taken seriously by educators, students, politicians, and above all by the frontier AI labs themselves.
Educators tend to point to the ways ChatGPT and its competitors have affected us — eroding our ability to accurately assess student writing because such a large proportion of students turn in machine-generated essays, and forcing us to come up with entirely new assignments and lesson plans as a result.
But in the longer run, the damage is being done to students. By making effort an optional factor in higher education rather than the whole point of it, LLMs risk producing a generation of students who have simply never experienced the feeling of focused intellectual work. Students who have never faced writer’s block are also students who have never experienced the blissful flow state that comes when you break through writer’s block. Students who have never searched fruitlessly in a library for hours are also students who, in a fundamental and distressing way, simply don’t know what a library is even for.
New York Magazine has a new article on student use of ChatGPT which captures the problem well. Here’s a Columbia student who speaks for a significant chunk of the current university population:
“Most assignments in college are not relevant,” he told me. “They’re hackable by AI, and I just had no interest in doing them.” While other new students fretted over the university’s rigorous core curriculum, described by the school as “intellectually expansive” and “personally transformative,” Lee used AI to breeze through with minimal effort. When I asked him why he had gone through so much trouble to get to an Ivy League university only to off-load all of the learning to a robot, he said, “It’s the best place to meet your co-founder and your wife.”
I will speak frankly. This sucks.
It sucks the joy out of teaching, and it sucks the meaning out of the whole experience of getting an education.
When I was a postdoc at Columbia, I taught one of the core curriculum classes mentioned here, with a reading list that included over a dozen weighty tomes (one week was spent on the Bible, the next week on the Quran, another on Thomas Aquinas, and so on). There wasn’t much that was fun or easy about it. And yet — probably for that very reason — I learned more from teaching that class than from any other. Something fundamental about that experience feels like it’s ruined now.
But this is not the entire story. The middle part of D. Graham Burnett’s New Yorker piece strikes me as an important corrective to this sort of thing. Burnett is, I think it’s fair to say, rapturous about his students’ response to an assignment asking them to discuss the concept of attention with ChatGPT, then edit and submit the results.
Here’s a sample:
Reading the results, on my living-room couch, turned out to be the most profound experience of my teaching career. I’m not sure how to describe it. In a basic way, I felt I was watching a new kind of creature being born, and also watching a generation come face to face with that birth: an encounter with something part sibling, part rival, part careless child-god, part mechanomorphic shadow—an alien familiar.
I have had the same feelings, for instance when I first began tinkering with history simulation assignments. Language models are a genuinely novel teaching tool. Their impact is still uncertain. What that means is that now is exactly the time when people who are genuinely passionate about teaching and learning for its own sake — not as a scorecard to judge politicians, not as a source of corporate profit — need to take an active role.
My greatest concern when it comes to LLMs in humanities education is that they will lead to a further polarization in educational outcomes. The Princeton students who Burnett teaches seem extraordinarily thoughtful and creative in their responses to his assignment. I suspect students in a social studies class at an underfunded public high school class would not be.
For this reason, it is vitally important that educators learn how to personally create and deploy AI-based assignments and tools that are tailored directly for the type of teaching they want to do. If we cede that ground, if we ignore the challenge, then we will watch helplessly as education gets taken over by cynical and stultifying “AI learning tools” which trumpet their interactivity while eroding the personalized student-teacher relationship that is at the heart of learning.
This is the basic thinking behind an NEH grant which I and two of my UCSC colleagues, Pranav Anand (linguistics) and Zac Zimmer (literature), were awarded in January of this year… and which got cancelled by the Trump administration/DOGE last month.1 We are continuing our planned work, and I’ll keep writing about it here.
I’d love to hear your thoughts in the comments, and please consider supporting my work via a paid subscription if this is an option for you.
Weekly links
• D. Graham Burnett’s book The Sounding of the Whale, is, incidentally, the most unusual and delightful book about the history of cetacean science you will ever read. I relied on his chapter about John C. Lilly extensively when I was writing the “dolphins on LSD” part of Tripping on Utopia.
• “The fragmentary letter was found preserved inside a 1608 book by Johannes Piscator, an almost 1,000-page tome dissecting biblical texts. The letter was not tucked into the pages as bookmark or memento but was part of the book’s very construction, as strips of wastepaper deployed as padding to prevent the text block from chafing against the binding.” New findings about Shakespeare’s relationships with his wife, Anne Hathaway (Washington Post).
• Congratulations to the UNC historian Kathleen DuVal, whose most recent book Native Nations: A Millennium in North America won the Pulitzer prize for best work of history this week. DuVal’s The Native Ground (2006) is among the more interesting history books I’ve ever read, and I’m looking forward to reading her new book.
And if you happen to be the sort of person who makes donations to universities and would like to support that cancelled NEH grant, please email me at bebreen [at] ucsc [dot] edu.
This is great, thank you. I am a grad student in the philosophy department at SFSU and teach critical thinking. This semester I pivoted my entire class toward a course design that feels more like running an obstacle course with AI than engaging with Plato. And my students are into it. As undergrads they are overwhelmingly (and honestly, depressingly) concerned with taking as many units as they can and this idea of "AI-Hackable" classes resonates clearly. So pushing them to figure out how to use the tools in service to the critical thinking skills that I want them to develop has been fun and challenging for all of us, and they seem to appreciate it! Cheers.
High school teacher here--I teach classes on government, economics, and philosophy--and I really appreciated your take here. It matches a lot of what I have been grappling with over the last several years!
I'm wondering if we are overdue for a broader cultural reckoning around *what education is for*. In light of the New Yorker piece on rampant AI use, some have pointed out that this outcome felt all but inevitable in a system steeped in extrinsic motivators (like grades) and that sort of relied on intrinsic motivation happening ✨all by itself✨
There seem to be some much deeper questions around the neoliberal administration of education, as well. Testing, scoring, and grades are all instruments to "fairly" distribute scarce educational resources to those deemed most deserving. Are these suppositions due for revision?
Your takeaway that this needs to be treated as a valuable tool that will make new and better forms of education possible is the most important part. Like books, or the internet, LLMs give us a new way of interacting with information, and now we need to figure out the best way to deploy it in service of learning. It doesn't get to be optional--it feels existential.