Every AI chatbot writes like itself, and forensic linguists can now prove it. New research into AI writing patterns reveals that models such as ChatGPT and Gemini develop what linguists call an idiolect: a distinctive stylistic signature as statistically measurable as a human author's fingerprint. For European universities, schools, and publishers already wrestling with AI-generated content, this finding reframes the entire detection debate.
The Science Behind AI Writing Fingerprints
Forensic linguists have long applied stylistic analysis to identify human authors in legal and academic contexts. The same techniques, applied to AI output, reveal that large language models exhibit consistent, differentiable patterns across thousands of samples.
A recent comparative study analysed hundreds of essays on diabetes generated by both ChatGPT and Gemini using the Delta method, a forensic authorship technique first formalised by the computational stylistician John Burrows. Researchers calculated linguistic distances between writing samples with striking results:
- A 10% sample of ChatGPT essays scored 0.92 against ChatGPT's full dataset.
- The same ChatGPT sample scored 1.49 against Gemini's output, a sharp statistical separation.
- Gemini scored 0.84 against its own samples and 1.45 against ChatGPT's work.
Those Delta distances confirm that each system writes with a statistically identifiable voice. The patterns emerge not from deliberate programming but from architectural choices and the composition of training data.
The differences surface most clearly in trigrams: three-word combinations that act as stylistic tells. ChatGPT gravitates towards formal, clinical phrasing such as "blood glucose levels," "individuals with diabetes," and "characterised by elevated." Gemini, by contrast, opts for more accessible expressions: "high blood sugar," "blood sugar control," and "the way for."

European Voices on the Implications
The research lands at a moment when European institutions are actively building frameworks around AI-generated content. Professor Thierry Poibeau, a computational linguist at the CNRS Lattice laboratory in Paris, has argued that stylometric methods developed for human authorship attribution transfer naturally to AI systems, precisely because both humans and models are shaped by the data they absorb. The idiolect, in that sense, is not a metaphor but a measurable reality.
At the regulatory level, the EU AI Act's transparency obligations already require that users be informed when they are interacting with an AI system. Classifying AI output by model-specific stylistic signature adds a further forensic layer: not just "was this written by a machine?" but "which machine wrote it, and why does that matter for the assessment?" Lucilla Sioli, Director for Artificial Intelligence and Digital Industry at the European Commission, has emphasised that trustworthy AI requires traceability across the full content pipeline, a principle that idiolect research supports directly.
What This Means for EU and UK Educators
Academic integrity offices across Europe are already under pressure. Universities in the UK, Germany, and the Netherlands have all updated their assessment policies since the launch of ChatGPT, yet most detection tools remain blunt instruments that flag AI probability without identifying the specific model involved. Knowing which model produced a submission changes the conversation considerably.
Understanding that ChatGPT tends towards textbook formality while Gemini adopts a more conversational register provides examiners with contextual evidence. A philosophy essay that reads like a clinical protocol note is a different kind of anomaly from one that sounds like a knowledgeable friend explaining a concept over coffee. Both might be AI-generated; the stylistic signature points towards which system was likely used and, by extension, how the student deployed it.
The practical implications for educational institutions are significant:
- Academic integrity: Schools and universities can refine detection by recognising model-specific linguistic patterns, moving beyond binary AI-or-human verdicts.
- Assessment design: Examiners aware of each model's natural register can design tasks that are harder to complete without genuine engagement, regardless of which tool a student uses.
- Policy granularity: Institutions can distinguish between a student who used a formal-precision model for structural scaffolding and one who submitted verbatim chatbot output.
- Publisher and platform authentication: European media outlets and academic journals can trace submitted content to specific AI systems, strengthening editorial gatekeeping.
- Brand and tone alignment: Organisations procuring AI tools for communications can select models whose natural register matches their house style, reducing editorial overhead.
Do AI Idiolects Persist Across Languages?
Europe's multilingual landscape adds a layer of complexity that monolingual research cannot address. Does ChatGPT maintain its clinical precision when operating in French, Polish, or Finnish? Early evidence suggests models may develop language-specific stylistic sub-variants, potentially reflecting the cultural communication norms embedded in their multilingual training corpora. A model trained heavily on formal German academic prose may write differently in German than its English-language idiolect would predict.
This has direct relevance for EU institutions deploying AI across member states. A customer-facing chatbot calibrated for formal Dutch banking communication may behave stylistically differently when switched to Italian, and that difference may not be intentional. Idiolect research, extended to multilingual contexts, could help organisations audit and correct those inconsistencies before they reach end users.
Comments
Sign in to join the conversation. Be civil, be specific, link your sources.