AI legibility, physical archives, and the future of research
A followup to "The leading AI models are now good historians"
Pradyumna Prasad, a computer science PhD student at the National University of Singapore, recently made what I thought was a very perceptive observation about where AI and knowledge work is headed:
If your job is legible enough that people can make a dataset clearly pointing out what is right and what is wrong, you are at the highest risk for an AI model being ‘superhuman’ at your job. It is even more risky if it is possible to articulate your thought process in a way that is verifiable.
Looking at this perspective, it makes more sense that competitive programming and mathematics have been attacked first by these models. Not only do they provide a clear source of ground truth, you can also hire people to give intermediate steps and there are tools like proof assistants to tell you if the proof is correct or not.
DeepSeek has shown that reinforcement learning (RL) remains extremely effective at advancing the state of the art in AI models. What this means is that we can expect continued or even accelerated improvement in domains where reinforcement learning works.
But this is not every domain of knowledge. RL requires a clearly defined set of correct answers (and all the better if those answers can be broken down into discrete steps). This lends itself very well to math and coding, as well as to translation and to the sorts of GRE-style questions used in many current benchmarks like Humanity’s Last Exam.
Given this, Prasad arrives at the following conclusion:
The best placed people are those for whom it will be unprofitable or extremely difficult to create or procure datasets that give a clear signal.
No one can predict where AI models will go in the next few years, and I remain deeply skeptical of anyone who claims to be able to do so (this includes both the boosters promising utopian AGI, the alarmists who predict global catastrophe, and the skeptics who dismiss the field as hype).
However, I do think this line of thought suggests a more moderate, and to me more plausible, medium-term future which I haven’t seen discussed much. It looks like this: AI does not lead to a singularity, but it does exceed human abilities in domains like math, coding, reasoning, and translation, because those forms of knowledge lend themselves to reinforcement learning.
AI also gets shockingly, dismayingly good at making “slop” content — if we measure “good” as “someone wants to pay money for it,” since that, unfortunately, is also a reward function RL can optimize for. Any knowledge domain where you can shovel up a lot of digital data, break it up into parts labelled true and false (or “humans liked this” versus “humans didn’t like this”) is fodder for AI systems. Think social media posts and the lower grades of fiction or streaming shows.
But where AI may hit an unexpected wall is any domain which depends upon some combination of intuition, physical presence, and illegible data.
Like, say, history.
Intuition and the spectrum of legibility
I mean something sort of specific when I say “intuition” here, and it’s rooted in the material experience of being a historian. Bear with me for a second and I will explain.
Historians cultivate a specific form of intuition. Nothing is well defined when you begin a historical research project. You go into an archive with a question or topic in mind, but without an answer or a clear argument. You leaf through documents or do word searches in databases without a real sense of what you will find — often, the finding aids are either nonexistent or they tell you almost nothing.

Over a period of days, weeks, or months, you start to build up a feeling for what’s in the archive: who constructed it, how it’s organized, what matters within it, and (importantly) what was intentionally or unintentional excluded from it. In my experience, this really is more of a vague sensibility than something you can quantify. It’s closer to David Lynch and Angelo Bandalementi stumbling onto the Twin Peaks theme than it is to a data scientist making a spreadsheet.
Sure, you can say that a given archive contains 14,523 letters sent from colonial Brazil to Portugal in the 18th century. You could even digitize all of those letters and ask an LLM to find themes in them and output it as JSON data.
But there is something very personal in the way that your brain, individually and idiosyncratically, remembers and links together the things you find inside those letters. It is not just about textual or even visual data but about sense memory, social relationships, and even what you dream about. It is about intuition grounded in material facts.1
A brief example
Let’s say there’s one particular letter that stands out from the others. In it, a soldier in 18th century Rio de Janeiro begs to return to Portugal because he’s sick and badly needs his mother (yes, this is a real letter I remember reading, in Lisbon in the spring of 2012).
It sticks out in your mind because, well… it just does. If asked to give a reason, it would be something like “it was weird” or “it was tonally different from the other documents in the folder.” Individually, there isn’t a lot you can do with it. But as you spend time in an archive, you start making semi-conscious or even unconscious connections between the things that “jump out” in this way.
Over long periods of time spent in physical archives, you develop a holistic picture of a slice of history that is necessarily individual to you, as the researcher, because you can’t control or even fully understand what your brain finds interesting in a set of documents. They’re just… interesting. Maybe not to anyone else, at first. But over time, you start to draw connections and explain to yourself why you found a certain source compelling or thought-provoking. You create an account of those connections, a sort of map of them.
And that, in turn, reveals the beginning of both a narrative and a causal argument.
I have been writing Res Obscura for 13 years now, and it has always been unpaywalled and free of any advertisements. The generosity of paid subscribers makes this possible.
Could you make a dataset that quantifies exactly how and why some historical data “jumps out” at historical researchers in this way? Maybe. But it sure as heck would fall under both the “unprofitable” and “extremely difficult” headings that Prasad mentions.
As for data illegibility: by that I simply mean “data which AI systems cannot plausibly access.” You can assume that current and future reasoning models will be able to trawl through all, or most, of the Internet. But a very large percentage of all words ever written remain undigitized.2 And naturally, the proportion is even higher for spoken words. The vast, vast majority of human knowledge, in other words, is still inaccessible to AIs, so long as they rely on digital media as their source of ground truth.
Which would naturally suggest…
Non-digital data just got more important
A key aspect of archival research is how much it depends on social skills. In some cases one might even say social engineering. My friend Felipe befriended an archivist in São Paulo — something he was able to do because he was from São Paulo, and spent months there during his PhD research — and ended up being given the keys to the archive. He volunteered to catalog materials previously unavailable to anyone, and used those in his dissertation. Until AI models can move physically in the world, interact with people in this way, unlock new data physically and materially, physical archives and the intuition for finding and deciphering illegibility will still be super important.
For Tripping on Utopia, one of the biggest revelations of my research process was realizing that you can just create new sources by calling someone on the phone. I knew this intellectualy, of course. I had read plenty of books based on oral history interviews, and I knew that old-school investigative journalism was all about convincing potential sources to share what they knew. But because my dissertation research was all about the early modern period, I had never tried it myself. It turned out to be remarkably easy to track down and interview people mentioned in historical texts.
Understanding the history of tariffs and financing in the Gilded Age may seem like a viable task for a Deep Research tool. But it will only give you a starting point… and for a serious historian be more like the negative space to avoid. This is because you’re trying to find, not the Wikipedia page for JP Morgan, but digitally “illegible” things like:
unpublished letters marked with notes like “to be destroyed” or “burn after reading this”
the recollections of his bitterest enemies (including things like unarchived diaries or oral histories)
the physical contents of his drawers when he died
The physical and sensory impressions of entering his house and library
Hernan Diaz’s 2022 novel Trust did a fabulous job of showing this. In its first half, Trust is a formulaic roman a clef novel about a wealthy financial wizard on 1920s Wall Street, then a self-satisfied memoir by the same figure. But this narrative then continues to fracture, and we see the same story from the perspective of the financier’s wife — institutionalized in a Switzerland sanatorium — and one of his employees. These jumps in perspective are the best approximation I’ve ever seen of what it’s like to study someone in an archive.
The issue is that generative AI systems don’t want messy perspective jumps. They want the median, the average, the most widely-approved of viewpoint on an issue, a kind of soft-focus perspective that is the exact opposite of how a good historian should be thinking.
To be in an archive is to be confronted with the contradictions, controversies, secrets, and unspoken facts of a person’s life or of a time and place.
To encounter these things via a tool like Deep Research is a surreal experience, because you get an approximation of the historical reality, but with all the illegible data smoothed away.
In fact, it isn’t even being excluded: it just isn’t being noticed.
Find the negative space
OpenAI’s Deep Research, which I’ve been experimenting with for the past couple weeks, is a great example of why a field like historical research seems replaceable, but isn’t. Something like ninety percent of the world’s data is not online. Where is it found? Archives, in part. Oral histories. Operating in the world. Having intuition about the world and the non-digital data within it.
Which is why I remain optimistic about this, and why, in particular, I think it would be an enormous mistake for humanists to form ranks against AI (even if many critiques are legitimate, as I believe they are). We have a very important role to play in shaping a non-dystopian future for it.
As a young adult, I wanted to be a figurative oil painter before I decided to become a historian. Although I was never really any good at painting, some of the things I learned from professional painters have stuck with me. I think all the time, for instance, about the advice of a painting teacher I had when I was 19.
The gist of what he told us boiled down to a set of guidelines for creative work in general:
• You’ll never actually feel finished with anything. You just have to pick a time to stop working on it, then call it done.
• What you subtract from a picture is usually more important than what you add to it.
• To capture the essence of a thing, you have to find the negative space around it.
The reason I still think about these guidelines is that they shape how I approach writing — and, increasingly, how I interact with AI systems. Generative AI has made it absurdly easy to generate a lot of text or images. But it hasn’t made us any better at subtracting the useful, meaningful, or simply interesting stuff from text and images — isolating the gold from the fool’s gold.
And this is precisely why historical analysis is important in an era of AGI. As a field, we now can more clearly see the negative space around what is legible to AI systems, and work within those constraints to do and think things that even hyper-advanced machines trained exclusively on digital data cannot.
In other words: physical research and critical thinking about the nature of archives and qualitative data just got way more important. Not less.
A quick poll
I have about 30 posts half-drafted for this newsletter, and I am trying to decide which to finish. I tend to write about a pretty eclectic range of topics — the most popular recent ones have been about coca in the seventeenth century, my most recent post on AI in historical research, and the history of child-rearing.
I love that I have been able to build a decent readership (6,500 subscribers as of today) by writing about whatever I find interesting. But it’s also helpful to get a sense of what my readers want more of. So I thought it would be useful to do a poll. Here are five posts I’ve got unfinished drafts of.
Which would you most like to read, based on the titles?
The first post is about the odd convergence of folklore from Europe and the Caucasus involving bees, consciousness, and the afterlife. The second is about Jorge Luis Borges discussing AI and free will with Herbert Simon in Buenos Aires in 1969. The third is more of an old school Res Obscura post about Sibero-Scythian art and archaeology. The fourth is a mailbag post with reader comments and my responses to “The leading AI models are now good historians.” And the fifth would be a preview and development update of an educational history simulation game I’m building for classroom use, where you “play” as Darwin on one of the Galápagos Islands in 1835.
Weekly links
• “In his last years, he was known as the rarest creature in the world.” (Wikipedia entry for Lonesome George)
• Great article by Simran Agarwal on my favorite Mughal emperor, and one of my favorite early modern people in general, Jahangir, and his love of clothing (Public Domain Review)
• Yesterday my book Tripping on Utopia won the PROSE award from the American Association of Publishers for best book on the history of science, medicine or technology published in 2024. Thank you to the AAP!
• “Caro leafs through the pages, and it all starts coming back. He points to a passage relating to some moment of New York political intrigue too arcane to even make his book (about Moses’ 1929 investigation of the City Trust Company). ‘This was a thing no one knows about—I had to decide to leave it out,’ he says, as though he still rues every such omission.’” (From a profile of Robert Caro’s archive in Smithsonian)
As Simon Willison writes in the context of AI hallucinations, AI-generated errors in prose can only be spotted with “a critical eye, strong intuitions and well developed fact checking skills” whereas “with code you get a powerful form of fact checking for free. Run the code, see if it works.” This is part of why I am more optimistic than many others seem to be about the implications of AI for the humanities in the longer run. Our claims are inherently not falsifiable or demonstrable in the same way as math or code, yet at the same time, they are not airy, nonsensical claims ungrounded by evidence. It’s just that the evidence and data gathering involves physically going to places and materially operating within archives (developing “strong intuitions”). Additionally — and relatedly — a surprising amount of the work of history (and related disciplines like anthropology) rests on general knowledge that comes from spending time, not just in an archive, but in a culture. For my first book, I was studying the Portuguese drug trade while also walking around Lisbon and observing its existing pharmacies and looking at the early modern drug jars in the museums. My brain was also having the experience of learning the Portuguese language while doing so (noticing that the “drogaria” on the corner is where you buy soap, for instance).
What percentage of all written words in history are available as digital data? Numbers between 5% and 10% are sometimes thrown around. I suspect the percent of written words that are online is now a bit higher, but not much. I asked OpenAI’s Deep Research tool for a “well grounded, factual estimate drawing on existing work on this topic.” You can read the full report here. The estimates seem reasonable to me: “Rough calculations suggest on the order of 10–30% of all words ever written are accessible in digital form online” while perhaps “only 0.01% to 0.1%” of all words ever spoken are available online.
Way to bury the lede--congratulations on the PROSE award!!
Note that making new math or physics (as opposed to solving an artificial problem like the olympic ones) requires a very illegible skillset, and has no clear ground truth, unlike coding. By any chance have you read Polanyi’s books The Tacit Dimension and Personal Knowledge? I think he really hits the nail on the head in this regard