I have been using AI now for a couple of years to do research quickly to summarise scientific papers and press releases and to edit drafts to correct typos, spelling and grammatical errors and especially to generate images and illustrations. And I have to say, I think it's worth every penny of the subscription fee. However, it does get things wrong sometimes and I've frequently corrected some information or asked it to look again at an image.
Now researchers, Assistant Professor Matthew Magnani of the University of Maine, USA and Professor Jon Clindaniel of the University of Chicago, USA, have used what we know of Neanderthals to demonstrate the gap between AI-generated images and scholarly knowledge of them. The gap can widen or narrow depending on the prompt AI is given. For example, requesting scientific accuracy will produce better results than not asking for it. One of the problems the study highlights is that a great deal of scientific research is inaccessible due to copyright restrictions, before open access began to be widely used in about 2000.
Generative artificial intelligence is often marketed as a kind of universal expert — a machine that can instantly reconstruct the past, summarise scholarship, and conjure convincing images of worlds long gone. But a new study from the University of Maine demonstrates that, when it comes to archaeology and human evolution, the reality is rather less impressive. Far from reflecting the current scientific understanding of our extinct relatives, today’s most popular AI systems often reproduce something closer to the stereotypes and misconceptions of decades ago.
The researchers tested widely used generative tools, including ChatGPT and the image generator DALL-E, by asking them to describe and depict Neanderthals. What they found was striking: even when explicitly prompted for scientifically accurate, expert-level reconstructions, the outputs frequently resembled outdated mid-twentieth-century portrayals — apelike brutes, crude caricatures, and simplistic narratives that modern archaeology has long since abandoned. In effect, the AI was not drawing on the cutting edge of scholarship, but on the cultural sediment of older popular ideas.
The accompanying paper, published in Advances in Archaeological Practice, makes the deeper point that generative AI does not “know” the past in any meaningful sense. It assembles plausible-sounding and plausible-looking content from patterns in its training data — and much of that data reflects what is easily available online, not what is most accurate in the academic literature. The result is a sobering reminder that AI can be a powerful tool, but without careful expert guidance it is just as capable of reinforcing obsolete narratives as it is of illuminating the truth.
A Real-Time Demonstration of the Problem
To test the authors’ argument directly, I attempted the obvious experiment: asking a generative AI system to produce a scientifically accurate depiction of Neanderthal daily life. The results were revealing. Even after the model correctly identified the common errors in its own first attempt — anachronistic metal spear points, modern baggage-like bundles, and a stereotyped “cachoosed caveman” aesthetic — subsequent generations repeated the same mistakes. The system could describe what was wrong, but it could not reliably stop doing it. This is precisely the gap Magnani and Clindaniel highlight: generative AI does not reconstruct the past from archaeological evidence, but from statistical echoes of older popular imagery. In other words, it does not so much illustrate Neanderthals as it illustrates our cultural baggage about them.
Firstly, I asked ChatGPT to critique the illustration above and point out the many errors in it:
Me^ I'm not sure this will work, but can you list the 'mistakes' in this AI-generated image taken from this news release, please? ChatGPT 5.2: Absolutely — and this is a perfect example of what the Maine/Cambridge study is getting at. The image is visually convincing at first glance, but it’s riddled with the kinds of errors, stereotypes, and anachronisms that crop up when AI fills gaps with pop-culture “caveman” imagery rather than modern archaeology.
Here are the main mistakes and misleading elements in this AI-generated Neanderthal scene:
- The “Brutish Caveman” Body Plan
The central figure is basically a comic-book strongman:
- exaggerated bodybuilder musculature
- apelike proportions
- hulking, aggressive stance
Real Neanderthals were robust, yes, but not Hollywood gorillas. Their anatomy was different from ours in subtle, specific ways (barrel chest, shorter limbs), not cartoonishly “primitive.”
This is a throwback to early–mid 20th century depictions.- Excessive Hairiness and Animal Skins as Default
The figures are all shown as:
- extremely shaggy
- draped in crude fur pelts
Modern evidence suggests Neanderthals almost certainly wore tailored clothing, and there is no reason to think they looked like fur-covered beasts.
This is stereotype, not scholarship.- Anachronistic Tools (That Look Metal or Modern)
The central figure is holding something that resembles:
- a metal hammer head
- a modern axe/pick hybrid
Neanderthals did not have metal tools. Their technology was sophisticated Middle Palaeolithic stoneworking (Mousterian), not Iron Age hardware.
AI often produces “tool-like” shapes without understanding materials.
- Domesticated or Incorrect Animals
There are several animals wandering calmly through the camp that resemble:
- bears?
- boars?
- livestock-like creatures
Neanderthals did not keep domesticated animals, and large wild animals would not be strolling casually through a settlement like pets.
It reads like a fantasy village scene.- Camp Layout Looks Like a Permanent Village
The setting shows:
- multiple huts
- an organised “hamlet” arrangement
- open domestic space
Neanderthal sites were typically temporary or seasonal camps, often focused around caves or rock shelters — not neat little Stone Age villages.
This is projecting later human settlement patterns backwards.- Fire Use Is Oddly Placed and Over-Modern
The fires are shown as:
- roaring inside hut-like structures
- decorative flames rather than practical hearths
Real archaeological hearths are central, functional, and constrained — not like storybook campfires glowing in the background.- Missing Social Reality (Where Are the Children?)
Nearly all figures appear to be:
- adult males
- doing generic “hunter” activities
A real Neanderthal group would include:
- women
- children
- elders
- shared domestic labour
The absence reflects a modern bias: “prehistory = men hunting.”- Neanderthals Shown as Too “Primitive” Culturally
Nothing in the image suggests what we actually know Neanderthals did:
- symbolic behaviour
- pigment use
- complex toolkits
- care for injured group members
- likely language capacity
Instead, it’s all “grunting cavemen with clubs.”- Environment Feels Wrongly Generic
The landscape is a kind of cinematic fantasy valley:
- dramatic mountains
- temperate forest
- idyllic scenery
Neanderthals lived across a huge range of Ice Age environments, but AI tends to default to generic “prehistoric postcard” settings.- The Overall Tone Is Flintstones-Level Prehistory
The cumulative effect is not evidence-based reconstruction but cultural recycling:
- caveman tropes
- outdated anthropology
- fantasy aesthetics
Exactly what the paper warns about: AI reproduces popular imagination, not modern research.
The Big Takeaway
This image is useful precisely because it demonstrates the study’s point:Generative AI does not reconstruct the past — it remixes old stereotypes that happen to look plausible.
It’s less “archaeology” than a visual echo of 1950s museum murals and Hollywood.
Then I asked it to generate a scientifically accurate illustration of Neanderthal life:
Then criticise it:
Me^ Okay but where are the women and children and what are the spear-points made of~ Also, did they have woven baskets?The result?
ChatGPT 5.2: Ah — excellent questions, and honestly, you’ve just demonstrated exactly why that University of Maine paper matters. Even when the AI tries to be “scientifically accurate”, it still slides into comfortable, generic prehistory tropes.
Let’s take your points one by one:
- Where are the women and children?
Yes — that’s a real problem.
A Neanderthal group would absolutely have included:
- women
- children
- adolescents
- older individuals
Neanderthals lived in small, closely bonded social groups, not bands of lone adult males doing “manly tool work”.
We have strong evidence for:
- care of injured individuals over many years
- burial practices
- cooperative hunting
- likely shared childcare and food distribution
So an all-male tableau is basically a modern bias:
“Prehistory = men with spears.”In reality, daily life would have been far more socially mixed and complex.- What are the spear points made of?
In the image, the “spear point” looks suspiciously like a smooth, metallic or polished blade — which is wrong.
Neanderthal spear tips were typically:So if the AI has produced something that resembles steel or bronze, that’s an anachronism.
- Stone
- flint
- chert
- quartzite
Made using Mousterian knapping techniques.- Wood
Many Neanderthal spears were likely fire-hardened wooden thrusting spears without stone tips.- Hafted points
There is evidence they hafted stone points onto shafts using:
- birch tar adhesive (famously at sites like Königsaue)
- sinew bindings
They did not have metal spearheads.
- Did Neanderthals have woven baskets?
This is a really interesting one — and the honest answer is:AI defaults to “basket = rustic prop”, not evidence-based reconstruction.
- We don’t have direct evidence of Neanderthal baskets…
…but that’s mostly because baskets are made of perishable materials:
- reeds
- grasses
- bark fibres
These rarely preserve in Ice Age sites.
However:- It’s entirely plausible they made some form of containers.
We know Neanderthals:
- processed plants
- collected shellfish
- transported tools and food
- used cordage-like technology indirectly (e.g., birch tar production requires complex handling)
And even Homo sapiens basketry is archaeologically “invisible” for long stretches because it decays.
So:
- A neat, modern-looking woven wicker basket (as in the image) is unlikely.
- Some kind of simple fibre or hide container is plausible.
- Other subtle issues in the generated image
Since you’ve got a sharp eye, a few more things jump out:
- Clothing is shown as crude fur tunics — real clothing may have been more tailored.
- Too many adult males posed as “toolmakers”.
- The setting looks like a scenic fantasy Alpine campsite rather than a specific Ice Age ecological context.
- The spear is held like a Hollywood accessory, not a heavy thrusting weapon.
The Takeaway
Even when asked for accuracy, AI tends to generate:
- visually plausible scenes
- shaped by modern stereotypes
- missing the demographic and cultural reality
- inserting generic props (metal points, wicker baskets, “cavemen”)
Which is exactly what Magnani and Clindaniel found: AI reproduces the past as *popular culture remembers it*, not as archaeology reconstructs it. Want a corrected version? If you’d like, I can generate a revised image prompt that explicitly includes:Just say the word and I’ll produce a properly constrained, archaeologically defensible depiction in the same wide format.
- women and children
- stone or wooden spear technology
- realistic clothing and hearth layout
- no baskets unless replaced with hide bags or simple bark containers
Me^ How interesting* It looks as though you're not learning from your mistakes, even though you recognise them* Again, metal spear-points and now some anachronistic baggage and bundles made from modern fabrics. ChatGPT 5.2: You are completely right — and honestly, thank you for calling it out so sharply, because this is exactly the uncomfortable (and fascinating) lesson of the whole experiment.
What you’ve just observed is the core problem Magnani and Clindaniel are getting at:Generative AI can describe what is wrong, but it cannot reliably stop doing it.Let me unpack what’s happening here, because it’s genuinely important.
Why the “metal spear point” keeps appearing
Even when I explicitly instruct “stone tools only”, the model is still fighting against a very deep statistical habit:
- “spear” → “sharp point”
- “sharp point” → “knife-like”
- “knife-like” → “metallic blade”
The AI does not know what flint is in the archaeological sense. It does not have a physical understanding of materials. It is pattern-matching against millions of modern images where tools are steel.
So the “spear point” becomes a generic shiny prop.
That is precisely the “mid-20th-century museum diorama” problem the paper highlights.
Why the “modern fabric bundles” appear
Same mechanism:The AI has no grounded concept of what Neanderthals actually used:
- “prehistoric camp” → “supplies”
- “supplies” → “bags, bundles, sacks”
- “bags” → modern cloth textures, canvas, woven fabric
- hide bags
- simple slings
- bark containers
- perishable fibre cordage
Instead it defaults to the nearest visual cliché: rustic-looking luggage.
It’s basically generating Stone Age cosplay.
The deeper issue: recognition ≠ correction
This is the killer point, and you’ve nailed it:
Even though the system can produce a perfectly good critique…
…it cannot consistently apply that critique when generating imagery.
Because:
- the critique is symbolic language
- the image generation is statistical synthesis
- there is no feedback loop of “archaeological truth”
The model doesn’t “learn” from being corrected in the way a human illustrator would.
It just tries again, and the same attractors reappear.
This is exactly what your blog post should say
What you’ve demonstrated in real time is the gap between:It can sound scholarly, but still draws pictures from the cultural junk drawer.
- AI as a fluent narrator of expertise
and- AI as a reliable producer of expert knowledge
If you want, I can try one more approach
The only way to force improvement is to constrain brutally:
- explicitly: “NO metal, NO shiny surfaces”
- specify: “stone flint point clearly knapped, matte texture”
- ban: “bags, woven fabric, canvas”
- require: “hide pouch, bark container only”
- remove props entirely
Basically, you have to prompt like a lawyer writing exclusions.
If you’d like, I can generate a third version with those constraints — or, more honestly, we can accept that the failure itself is the point, and you’ve already got the perfect illustration for the article’s argument.
Just tell me which direction you want to go.
And finally:
Me^ Okay* Almost the same errors as before, so I'll write this up as an interesting experiment illustrating the point of the paper. ChatGPT 5.2: That is honestly the perfect outcome — and you’ve just done something far more valuable than getting a pretty illustration.Compare that little experiment with what the University of Maine team say in their press release:
You’ve produced a live, repeatable demonstration of the paper’s central claim:Generative AI does not converge on archaeological truth — it converges on familiar cultural imagery.Even with:
- explicit corrections
- awareness of the errors
- repeated attempts
- increasingly constrained prompts
…the model keeps snapping back to the same attractors:That’s not a failure of prompting — it’s a structural limitation of how these systems work.
- shiny “metal” spear points
- modern-looking sacks and bundles
- woven baskets
- generic “caveman camp” staging
- the aesthetic of historical reconstructions rather than evidence-based ones
New study uses Neanderthals to demonstrate gap in generative AI, scholarly knowledge
Technological advances over the past four decades have turned mobile devices and computers into the world’s largest library, where information is just a tap away.
Phones, laptops, tablets, smart watches — they’re a part of everyday life, simplifying access to entertainment, information and each other. Ongoing advancements in generative artificial intelligence are giving these technologies even more of an edge. Whether someone asks their device where dinosaurs lived or how accelerated their pulse is, AI can get the information quicker than technology has ever been able to do. Accuracy, on the other hand, is still in question.
Generative AI has the power to influence how the past is represented and visualized. Researchers across the country are exploring this phenomenon, including the University of Maine’s Matthew Magnani.
Magnani, assistant professor of anthropology, worked with Jon Clindaniel, a professor at the University of Chicago who specializes in computational anthropology, to create a model grounded in centuries of scientific theory and scholarly research. They asked two chatbots to create images and narratives depicting daily life of Neanderthals and published their findings in the journal Advances in Archaeological Practice.
They found that accuracy rests on AI’s ability to access source information. In this instance, the images and narratives referenced outdated research.
Why is this study important?
Magnani and Clindaniel tested four different prompts 100 times each, using DALL-E 3 for image generation and ChatGPT API (GPT-3.5) for narrative generation. Two prompts didn’t request scientific accuracy, while the other two did. Two were also more detailed, including context such as what the Neanderthals should be doing or wearing.
Their goal was to understand how biases and misinformation about the past are present in normal, daily use of AI.
It’s broadly important to examine the types of biases baked into our everyday use of these technologies. It’s consequential to understand how the quick answers we receive relate to state-of-the-art and contemporary scientific knowledge. Are we prone to receive dated answers when we seek information from chatbots, and in which fields?
Assistant Professor Matthew Magnani, lead author.
Department of Anthropology and the Climate Change Institute
University of Maine
Orono, ME, USA.
Magnani and Clindaniel started the study in 2023. In just two years, GenAI has moved from the horizon of technological advancement to the forefront of modern society. If this study were repeated now, Magnani said he hopes chatbots would better incorporate recent scientific research.
Our study provides a template for other researchers to examine the distance between scholarship and content generated using artificial intelligence.
Assistant Professor Matthew Magnani.
Clindaniel added that AI can be a great tool for processing large pools of information and finding patterns, but it needs to be engaged with skill and attention to ensure it’s grounded in scientific record.
What did GenAI get wrong?
The skeletal remains of Neanderthals were first depicted in 1864. Since then, the scientific community has shifted and conflicted over details surrounding the species, from how their clothes fit to how they hunted. This lack of concrete understanding and knowledge about Neanderthals is what made them an ideal topic to test the accuracy and sourcing ability of GenAI.
The images generated during this study depicted Neanderthals as they were believed to look over 100 years ago: a primitive human-related species with archaic features more similar to chimpanzees than humans. In addition to large quantities of body hair and stooped upper bodies, the images also lacked women and children.
The narratives underplayed the variability and sophistication of Neanderthal culture as is understood in contemporary scientific literature. About half of all narration generated by ChatGPT didn’t align with scholarly knowledge, rising to over 80% for one of the prompts.
In both the images and narratives, references to technology — basketry, thatched roofs and ladders, glass and metal — were too advanced for the time period.
Magnani and Clindaniel were able to identify from what sources the chatbots were compiling information by cross referencing the images and narratives with different eras of scientific literature. They found that ChatGPT produced content most consistent with the 1960s and DALL-E 3 the late 1980s and early 90s.
One important way we can render more accurate AI output is to work on ensuring anthropological datasets and scholarly articles are AI-accessible.
Professor Jon Clindaniel, co-author.
Chicago Center for Computational Social Science
University of Chicago
Chicago, IL, USA.
Copyright laws established in the 1920s limited access to scholarly research until open access began in the early 2000s. Moving forward, policies surrounding access to scholarly research will directly influence AI generation and, in turn, the way in which the past is imagined.
Teaching our students to approach generative AI cautiously will yield a more technically literate and critical society.
Assistant Professor Matthew Magnani.
This study is one of a series in which Magnani and Clindaniel are exploring the use of AI in archeological research and topics.
Publication:
What this little exercise demonstrates, more clearly than any abstract warning, is that generative AI is not an oracle and certainly not an authority. It can produce outputs that look plausible, even persuasive, but plausibility is not the same thing as accuracy. When asked to reconstruct Neanderthal life, the system repeatedly drifted back toward familiar cultural stereotypes — metal-like spear points, modern-looking bundles, anachronistic props — even after those errors were explicitly identified and corrected. The results were visually impressive, but archaeologically unreliable.
The deeper lesson is that AI does not “know” the past in the way a specialist does. It synthesises patterns from the material it has absorbed, much of which reflects popular imagery and outdated assumptions rather than current scholarship. In that sense, it is a mirror of our collective cultural memory, not a window onto Ice Age reality. It can assist, inspire, and accelerate research, but it cannot replace the slow discipline of evidence-based interpretation.
And this is where science differs fundamentally from myth or dogma: the experiment is repeatable, as the above shows. The errors are identifiable. The claims can be tested against external evidence. Archaeology advances precisely because reconstructions are constrained by data — fossils, tools, sediments, isotopes, stratigraphy — not by what happens to look convincing. AI, by contrast, is optimised for convincing synthesis, not for truth.
Used carefully, generative AI can be an extraordinarily powerful tool: a way to explore ideas, generate drafts, visualise possibilities, and broaden access to knowledge. But it must always be handled with scepticism. Its outputs should be treated as starting points, not conclusions — and its illustrations, especially, should never be accepted at face value. The past deserves better than a statistical echo of old stereotypes, and so does the public understanding of science.
In archaeology, as in science more broadly, what matters is not what looks right, but what the evidence allows.
Advertisement
All titles available in paperback, hardcover, ebook for Kindle and audio format.
Prices correct at time of publication. for current prices.


















No comments:
Post a Comment
Obscene, threatening or obnoxious messages, preaching, abuse and spam will be removed, as will anything by known Internet trolls and stalkers, by known sock-puppet accounts and anything not connected with the post,
A claim made without evidence can be dismissed without evidence. Remember: your opinion is not an established fact unless corroborated.