How Generative AI Is Changing Our Vocabulary—And Maybe Our Thinking.
When I worked on one of the early posts in this series, a data scientist I was collaborating with pointed out a pattern in Generative AI writing that I couldn’t get out of my head. Certain words show up again and again in model-generated text. “Delve” is the one that hooked me.
Words like “delve,” “underscore,” and “intricate” have become a kind of AI accent. Small but reliable signals that a machine had a hand in the draft, at least for people who pay attention to these things.
Once I was alerted to it, I couldn’t stop seeing it. Delve kept jumping out at me, and I felt a small, smug satisfaction each time I noticed it. “Ah ha! You used ChatGPT…” It was like spotting a hidden watermark.
The irony is that I noticed all this because of my own writing. I had asked ChatGPT to tighten a paragraph in a draft, and the model quietly slipped in the word “delve.” On reviewing the post I was writing for technical accuracy, Jesse (the data scientist) caught this immediately and called it out as the classic AI fingerprint.
As an aside, I felt a kind of moral squeamishness about the use of generative AI here, as if I were cutting a corner I shouldn’t—like being caught smuggling notes into an exam. I find this instinctive reaction interesting, especially since the whole point of these tools is productivity, and their value ostensibly lies in speeding up the scaffolding around the thinking rather than replacing the thinking itself. In that light, my hesitation feels almost old-fashioned, maybe even a little generational. I’m interested to know what others think.
Anyway, as my delve obsession grew, I started seeing these fingerprints at work in the podcasts and articles of journalists and writers I have followed for years—people who have spent decades honing their voices. Were they now using generative AI? The emerging evidence suggests the answer is no.
Linguistic Feedback Loops
Many authors, scholars and journalists are beginning to acknowledge that we are watching a linguistic feedback loop form: the stylistic traits overproduced by generative models circulate through the texts we read, are adopted into our own speech and writing, and then re-enter future training data—further amplifying those same patterns.
This is not unusual in itself; technological change has always shaped language, and language patterns shift in response to the communicative technologies and social environments that shape our interactions. However, there are many novel factors about the output of generative AI that are somewhat concerning in this context, and they have implications for creativity and broader cognition.
How AI Acquired Its Accent
First of all, the giveaway words and phrases in LLM output are not random. They come from specific sources, and the backstory is quite interesting. Early models like GPT-3.5 and GPT-4 were trained on massive datasets scraped from the open web, including mountains of material from freelance platforms and content farms. These services churned out blog posts, SEO pieces, marketing copy, product descriptions, and anything else companies needed at scale. Because hiring writers in the U.S, the U.K, or Europe was expensive, much of this work went to highly educated, low-cost freelancers in countries such as India, Kenya, the Philippines, Bangladesh, and across Eastern Europe. The freelancers’ English was strong, but they wrote under rigid style guides and optimized everything for search engines. The result was polished, formulaic prose packed with buzzwords and smooth transitions. When LLMs absorbed this material, they absorbed the patterns too, which is why their default voice often sounds clean and professional but also generic and short on texture, originality, and nuance.
As a result, the output of large language models now often defaults toward a predictable, over-polished style that favors clarity and balance at the expense of the natural shifts in tone and phrasing that gives human expression its personality. And research is already seeing signs of this bleeding back into the way we communicate. A 2024 study by Yakura et al. analyzed hundreds of thousands of hours of podcasts and YouTube videos and found measurable spikes in LLM-associated words in human speech after the release of ChatGPT. It seems a generative AI linguistic drift had already begun in 2024, and ChatGPT has more than doubled its user numbers since then.
The emerging research gives us an early indication of how language may be shifting in response to generative AI. The Yakura et al. study is striking because the linguistic influence they identify does not appear only in scripted or AI-assisted text. It shows up in relatively spontaneous speech—in podcasts, lectures, and science and tech conversations—suggesting that people may be absorbing and reusing the phrasing patterns they encounter in AI-generated text even when speaking extemporaneously. In other words, AI is beginning to participate in shaping the shared pool of expression we draw from.
Creativity, Effort, and the Risk of Flattened Expression
Wiederhold’s work helps explain why this might be happening. She notes that humans tend to follow the principle of least effort: we adopt the language that feels ready-to-hand, especially when it comes from tools we treat as authoritative. But creativity depends on deviation—those unexpected turns of phrase that break from the statistical average. Predictive models, by design, smooth out those deviations. They favor the safe, the familiar, and the stylistically “balanced.” If their output becomes the default texture of the language around us, we risk a narrowing of expressive range over time.
The Yakura study raises a related concern: cultural and epistemic homogenization. If LLM-shaped phrasing spreads widely, it may subtly shift what feels natural or reasonable to say. That matters because language is not neutral—it frames how we describe problems, articulate disagreement, and imagine alternatives. Even people who never use AI tools may find themselves influenced indirectly, simply by absorbing the linguistic patterns that others repeat.
Language Shapes Thought—So What Happens Next?
None of this is definitive, and the current research has limits. But the direction of travel is important. If human and machine language are becoming intertwined, then we should be much more intentional and strategic about what we optimize for—whether we want linguistic environments that cultivate diversity, friction, and originality, or ones quietly shaped by the statistical preferences of our tools, which seem to systematically favor the typical over the distinctive and produce fluent but fundamentally bland expression.
After all, what is at stake is not simply vocabulary: the speech patterns we normalize shape our ideas, our relationships, the social cues through which we understand one another, and the wider epistemic environment in which knowledge is formed. So perhaps the simplest discipline is this: to notice. To notice what we absorb, what we echo, and whether those patterns reflect our own thinking or the gravitational pull of the machine. I, for one, have never used the word delve since…
Alexandra Frye
Alexandra Frye edits the Technology & Society blog, where she brings philosophy into conversations about tech and AI. With a background in advertising and a master’s in philosophy focused on tech ethics, she now works as a responsible AI consultant and advocate.

Alexandra, your post is very interesting and quite timely. Over the past 2 years or so, I’ve noticed and paid attention to some of these same markers. I often teach 2 online classes and 2 on-ground classes. At some point during the semester, all students are required to upload an analysis paper or argumentative paper. Given that many students use AI as a means to write or develop their papers, I have found some of the same digital fingerprints that I had been calling impressions (thinking back to Hume). Your comments about noticing and/or paying attention to these shifts in language and usage, particularly in the academic arena, are constructive.
Thanks for your comments LaChanda. Interesting that you notice these digital fingerprints. I think this stage of AI development is particularly challenging for schools and universities – for both professors and students. There are so many gray areas, such little established guidance, and even the tech models themselves are full of faults that are being fixed on the fly. It brings to mind the story of that Professor at a University in Texas that made headlines in 2023 because he used ChatGPT to detect plagiarism in his students essays – ChatGPT claimed to have authored all of them, but it was wrong. With the advancing capabilities of Gen AI and the attempts to drive it into widespread use, it must be so hard to determine where the lines lie between acceptable and unacceptable use of Gen AI (i.e assisting in research / essay prep etc.), or even really detect when it has been used.
I’m a bit surprised that we’re not considering what is (to my lights) a fairly obvious theory for why certain words and linguistic patterns appear in certain kinds of writing. The Generative AI models you mention were trained (in part) on large data sets of academic writing, and in many cases without the authors’ consent.
Many of the supposed “tells” or “classic fingerprints” of AI are actually just indications of a particular style of academic writing. My own obsession with specific vocabulary–and the em-dash–far pre-dates these large language models. (And I am an author whose writing was scraped without their permission!)
Some may call me supremely old-fashioned, but I am someone who has never used and will never use generative AI for my writing. This is because I think that the process of writing is the process of thinking, even at the scaffolding stage. So, I confess that I find it frustrating and disheartening when someone would smugly–and wrongly!–conclude that I used ChatGPT because they think they’ve spotted a secret, reliable watermark. It’s not that academics sound like AI–rather, AI sounds like us!
But why might someone seem to see certain words and linguistic patterns more often now? In terms of academic writing, I wonder how much of the frequency illusion is at play when people notice these supposed “tells”. (I’d venture the same may hold when examining the writing of people who have “spent decades honing their voices”.) I’ve actually used the supposed “reliable watermarks” less frequently over the years, but I’m curious about others’ writing. As for the linguistic flattening in the culture at large? Alas, I suspect that is because students (and the general public) were less inclined to copy the writing of academics qua academics but are often very eager to do so when it comes to AI! This could help explain how certain turns of phrase and punctuation marks have escaped academic containment.
Hi Amy. Thanks for taking the time to share your thoughts – it’s always nice to engage.
On the source of linguistic patterns, you are right that some academic material was included in the training data for these LLMs and without consent – I’m sorry this happened to you. Empirical evidence suggests, though, that the characteristic “AI voice” doesn’t come primarily from academic work, rather from the overwhelming volume of SEO-optimised web writing — produced at industrial scale for search engines and for commercial ends – such as LinkedIn posts, email newsletters, podcasts, marketing materials, website pages etc. That material vastly outweighs academic material in quantity and accessibility, which is why, statistically, its linguistic style dominates the model’s default style. Related to that point, when I wrote of spotting Gen AI tell words everywhere, I wasn’t so much referring to seeing them in academic work, but rather that kind of SEO optimised material. These are the places where Gen AI is being most used to drive “productivity”. And as you allude to in your comments about the inclination of the public to copy academic writing vs AI – in terms of a linguistic feedback loop, these sources will also likely have a greater effect on the general public given their accessibility vs academic work, which is so often kept in a gated academic domain.
I must also, sincerely, add that it was not my intent to frustrate or dishearten. My “smug” comment was intended as a self-mocking, self-deprecating lead-in to the truth about how I discovered the whole ‘delve’ business in the first place.
Lastly, I couldn’t agree more that the “thinking” is in the writing itself. Not just in how you construct sentences, phrase and organise, but in the physical act of writing itself. I have 3 school aged children, and I am highly concerned with the move away from handwriting towards keyboard-based writing in their schools. Substantial, and growing, research shows the importance of handwriting with respect to multiple areas of cognition – deeper attention, strengthened memory, more effective organisation of ideas and concepts. And having done my first degree before personal computers (yikes) – when I studied for my Master’s degree in 2019, I still highly recognised and valued the benefit of writing by hand to my thinking process.