top of page
Search

Reading Between the Lines: Identifying AI-Generated Text

  • Writer: TracerTick
    TracerTick
  • May 9, 2025
  • 4 min read

“Believe nothing you hear, and only one half that you see.” Edgar Allan Poe cautioned this advice in the 19th century. Despite the age, it seems the most relevant today where authenticity is no longer limited to identities, transactions, or data sources but spans into language itself. As generative AI (GenAI) continues to further integrate into our lives and careers, a new level of scrutiny often follows: Who authored this content? Is it accurate? And, perhaps most critically, can it be trusted?


The line between human-written and AI-generated content is becoming increasingly difficult to differentiate. In a matter of seconds GenAI can produce seemingly conversational and accurate sounding text, lowering the barrier for generating misinformation. Intentional or not, threat actors and well-meaning peers can publish AI-written content that appears credible on the surface but is inaccurate, misleading, or entirely fake.


If you've used AI detectors before, you're probably familiar with some of the common ones: GPTZero, ZeroGPT, or Copyleaks. These tools typically rely on statistical models that evaluate language features like perplexity and burstiness. Perplexity is a measurement of how predictable or “surprising” a piece of text is while burstiness refers to the variability in sentence length and structure. AI-generated writing tends to have low perplexity and uniform burstiness, making it statistically distinguishable from human writing.


However, these metrics are not foolproof. With prompt engineering, users can intentionally instruct large language models (LLMs) to output text with deliberate variability and unpredictability. Because of this, while statistical models focused on perplexity and burstiness remain somewhat effective, researchers are exploring new methods to counter and detect AI-generated content — ones that go beyond surface-level patterns and dig deeper into how language is really constructed.


ree

1. Neural Network-based Detection


Researches are starting to use neural networks designed specifically to spot patterns in text that human authors typically do not follow. These models are trained on large sets of both human-written and AI-written text, allowing them to begin to classify and spot the difference between the less predictable nature of human language and the more polished or repetitive style that often comes from GenAI.


In researching this blog post, I found that some even use transformer-based neural networks, which is the same architecture behind many LLMs. Not only do these systems look at surface stats like sentence length, but they also consider how ideas flow from one sentence to the next, whether the topic drifts too smoothly, or if the phrasing feels too perfect to be written by a human. They’re designed to catch the kinds of patterns that are easy to miss but meaningful when trying to tell who/what wrote something. Since many of these models use unsupervised learning, they’re built to evolve alongside the AI they’re trying to detect.


2. Digital Watermarking for AI-Generated Text


A more common tactic that has made the news recently is the embedding of invisible "watermarks" into AI-generated content. Traditionally, digital watermarking has been used to protect images and videos. Now, it's pivoting to protect written content, where subtle, reversible whitespace is added into the content during the generation process. To the naked eye, nothing is there. To detectors, it adds another layer that can be checked for clues into the legitimacy of a piece of text. More advanced use-case of watermarking provide a means of tracing generated content back to its original source, ensuring accountability and transparency.


3. Fingerprint-based Detection Systems


Fingerprinting has been used for decades in investigations. Whether it's the way someone types on a keyboard or the gait someone has when they walk, the goal is to tie a specific action or pattern to a specific individual. Researches are using the same method to detect AI-generated content. By constructing a "fingerprint" of a piece of text, a unique ID is created for the content based on its patterns. The goal here would be for researchers to build databases of known human-generated and AI-generated content, hopefully making it easier to detect or flag AI-generated text. These systems could be most useful in detecting long-form or high-volume AI-generated content, such as social media posts that might have been authored by bots masquerading as humans.


4. Hybrid Models


Obviously the most effective detection strategies come from hybrid models, which combine multiple detection approaches to increase accuracy. For example, integrating neural network-based models with watermarking and fingerprinting can provide a more robust framework for classifying text as AI-generated.



As with any evolving technology, no detection mechanism is perfect. While the methods discussed here are promising, they all come with disclaimers that they cannot guarantee the results of their analysis. With the headlines showing that GenAI is constantly advancing, detection tools will need to evolve in parallel to keep up with newer, more sophisticated models. For now, we are left with the task of asking ourselves: How can we ensure the content we interact with is accurate, trustworthy, and responsible? Perhaps it’s time to update Edgar Allan Poe's quote:


ree

Thanks for visiting,


– TracerTick


Disclaimer: All views, thoughts, and opinions expressed on this blog are my own and do not represent the views of my employer or any affiliated organizations. The content here is intended for educational and informational purposes only.

 
 

Drop Me a Line, Let Me Know What You Think

© 2035 by Train of Thoughts. Powered and secured by Wix

bottom of page