Crafting an AI Text Discriminator: Insights Gleaned

In the wake of the LLM surge, the ability to discern text crafted by humans versus that generated by AI has become a much-coveted skill. In this exposition, I will endeavor to architect an AI text discriminator from the ground up!

Antecedent Endeavors

Since the advent of ChatGPT, there has been burgeoning interest in distinguishing human-authored prose from that spun by large language models (LLMs), especially within the educational sphere. Despite numerous attempts, tangible success in implementing such detection tools has been elusive. Even OpenAI ventured into this realm with the release of an AI classifier, which, however, was retracted mere months later owing to its subpar accuracy. According to their own disclosure:

Our classifier does not boast full reliability. In our trials on a challenging set of English texts, the classifier accurately identifies 26% of AI-generated text (true positives) as “likely AI-generated,” while erroneously marking human-authored text as AI-generated 9% of the time (false positives).

Presently, the most triumphant AI text discriminator is ZeroGPT, which harnesses a deep learning model alongside several heuristic approaches, such as evaluating writing consistency and conducting internet text searches.

Tackling the Challenge

One of the key reasons why detecting AI-generated text on a broad scale is so challenging lies in the vastness of the domain. For instance, ZeroGPT necessitated extensive text corpora sourced from the internet, supplemented by synthetic datasets they engineered themselves, to train their deep learning model effectively. If you aim to develop a proprietary AI detection model, honing in on a specific use case (such as evaluating student essays) within a domain where you possess particular knowledge and expertise can significantly bolster your likelihood of success.

But what if you lack the requisite data to construct a robust machine learning model? Are there more economical methods for detecting AI-generated text?

Recent studies suggest that AI-generated text may exhibit detectable signals. For example, a Stanford paper published in April identified shifts in word frequency in scientific papers coinciding with the release of ChatGPT. Words like “pivotal,” “intricate,” “realm,” and “showcasing” were reportedly used more frequently by LLMs than by humans. However, deploying rule-based solutions (such as filters) based on these words has a glaring flaw—human language is deeply contextual (in gaming circles, for instance, “realm” is a common term!). Moreover, language is constantly evolving—new words enter our lexicon, and others fall out of favor. This dynamic nature suggests that machine learning, possibly integrated with tools like a paraphraser, will likely be necessary for detecting AI-generated text.percentage of tokens

My goal was to experiment with constructing a text detector that leverages this signal. My strategy involved masking a certain percentage of tokens in the text, using a generative model to predict the masked tokens, and then comparing these predictions with the original tokens. The underlying theory is that human-written text tends to be more diverse than LLM-generated text, so the more precise the predictions, the greater the likelihood that the text is AI-generated. This approach closely mirrors the “perplexity” metric employed by ZeroGPT.

The Code

Implementing this strategy poses several challenges, particularly in selecting the appropriate tokens to mask and seamlessly reintegrating the sequence for the LLM. One thing I quickly noticed is that the choice of tokens for masking significantly influences the perplexity score; for instance, prepositions like “a” and “the” are far easier for the model to predict than proper nouns. To mitigate this, my implementation filters out obvious stopwords before selecting tokens. Since the masking process is random, an improved approach could involve averaging results over multiple masking iterations or employing a shifting window method, which allows the generative model to leverage local context when making predictions.

import nltk
import random
from transformers import pipeline
from nltk.tokenize import word_tokenize

class AIOrHumanScorer():
“””
    Evaluates text to determine if it was likely produced by a generative model.
“””

    def __init__(self, model: object, mask_filler=“bert-base-uncased”):
        self.model = model
        self.mask_fill = pipeline(“fill-mask”, model=mask_filler)
        self.labels = [“human”, “auto”]
        nltk.download(“punkt”)
        nltk.download(“stopwords”)
        self.stop_words = set(nltk.corpus.stopwords.words(“english”))

    def _mask_fill(self, text: str, mask_ratio=0.15, max_tokens=512, random_state=42) -> tuple:
“””
        Computes a mask fill score for a given text sample by randomly masking words
        in the text and assessing how accurately a mask fill model predicts the masked words.
        Returns a tuple containing the original tokens and the predicted tokens.
“””

        # Truncate the text to ensure it remains within the token limit
        tokens = word_tokenize(text)[:max_tokens]

        # Randomly select words to mask, excluding stopwords
        random.seed(random_state)
        candidates = [(i, t) for i, t in enumerate(tokens) if t.lower() not in self.stop_words and t.isalnum() and len(t.strip()) > 1]
        if len(candidates) == 0:
            raise ValueError(“No valid tokens after stopword removal.”)

        n_mask = int(len(candidates) * mask_ratio)
        if n_mask == 0:
            n_mask = 1

        # Mask the selected target words
        targets = sorted(random.sample(candidates, n_mask), key=lambda x: x[0])
        masked_tokens = [t[1] for t in targets]
        masked_text = ”
        for i, token in enumerate(tokens):
            if len(targets) > 0 and token == targets[0][1]:
                masked_text += ‘[MASK]
                targets.pop(0)
            else:
                masked_text += token + ‘ ‘

        # Obtain the mask fill predictions
        fill_preds = [f[‘token_str’] for f in self.mask_fill(masked_text, tokenizer_kwargs={‘truncation’: True})[0]]
        return masked_tokens, fill_preds

    def score(self, text: str, mask_fill_threshold=0.4) -> float:
“””
        Returns a score indicating the likelihood that the text was generated by a
        generative model, based on the mask fill score.
“””

        # Calculate the mask fill score
        true_tokens, pred_tokens = self._mask_fill(text)
        return sum([1 for t, p in zip(true_tokens, pred_tokens) if t == p]) / len(true_tokens)

This code forms the backbone of a system designed to assess whether a piece of text is more likely to have been produced by a human or an AI model. The method revolves around masking certain tokens in the text, predicting those masked tokens using a generative model, and then comparing the predictions with the original tokens. The degree to which the model’s predictions match the original text provides insight into whether the text is AI-generated. By focusing on non-trivial tokens and using techniques such as stopword filtering and potentially multiple masking iterations, the accuracy of this detection method can be refined.

Exploring Model Variability

The code utilizes the bert-base-uncased model from HuggingFace, but it’s worth noting that different models can yield varying results. This opens up an intriguing avenue for experimentation—testing different models to identify which type of AI (e.g., LLaMA, Mistral, GPT, etc.) is responsible for generating the text.

Evaluation

To assess the effectiveness of this approach, I employed a dataset of labeled essays from Kaggle, categorizing them as either human-written or AI-generated. Given that the score method outputs a value between 0 and 1, I conducted some quick parameter tuning to identify the optimal binary classification threshold that maximizes the area under the ROC curve (AUC).

ROC curve

The evaluation results against this dataset indicate that the accuracy is marginally better than random guessing. However, the performance is still far from reliable, and I wouldn’t rely on this method to uphold academic integrity without further refinement.

evaluation results

Future Work

To address the increasing presence of AI-generated content on the internet, the development of advanced models and tools for detecting such content is essential. While heuristics like perplexity and writing consistency show potential, relying solely on a one-size-fits-all solution is unlikely to be sufficient. In the future, we will need the capability to create domain-specific models tailored to particular detection tasks. These models must also be adaptable and capable of evolving, given that natural language is deeply contextual and constantly changing over time.

Bonus

I used ChatGPT a little bit to format my code, but the content of this post is my own—just for fun, here’s what ZeroGPT has to say about it.

Zero gpt test

About the author : ballaerika1985@gmail.com