Skip to content

It-s-AI/llm-detection

Repository files navigation

Bittensor SN32

Discord Chat License: MIT


Decentralized AI Detection

⛏️ Mining Docs        |        🧑‍🏫 Validating Docs

Introduction

Our subnet incentivizes the development of distributed solutions aimed at identifying LLM-generated content.

Given the rapid growth of LLM-generated text, such as ChatGPT's output of 100 billion words daily compared to humans' 100 trillion, we believe that the ability to accurately determine AI-generated text will become increasingly necessary.

Problem

With the recent surge in LLMs appeared many cases where we do actually want to recognize where this text was generated by AI or written by human. Let's explore some scenarios to highlight the potential and significance of LLM detection.

  • For ML-engineers. Whether you’re sourcing training data, developing a foundational LLM, or fine tuning on your own data, you need to ensure generative text does not make it into your training set. We can help.
  • For teachers. While tools like ChatGPT offer numerous benefits for the educational sector, they also present opportunities for students to cheat on assignments and exams. Therefore, it is crucial to differentiate between responses authored by genuine students and those generated by LLMs.
  • For bloggers. Recently many bloggers faced with a lot of ai-generated comments in their social networks. These comments are not really meaningful but attract the attention of their audience and promote unrelated products. With our subnet, you can easily identify which comments are ai-generated and automatically ban them.

And many more, like:

  • For writers. By utilizing an LLM detection system, writers can assess their text segment by segment to identify sections that appear machine-generated. This enables them to refine these areas to enhance the overall human-like quality of their writing.
  • For recruiting. Have you also noticed receiving far more applications with lower candidate quality? AI has enabled people to spam hiring teams with artificially written cover letters and assessments. We help you find the candidates who care about your mission and your quality standards.
  • For cyber security. Scammers can leverage LLMs to quickly and easily create realistic and personalized phishing emails. We can help you determine the provenance of any document or email you’re reviewing.

As you can see there are a lot of areas where AI detection can be very helpful. We believe that creating llm-detection subnet not only provides a useful tool at a good price for people to use, but also encourages competition to make better and smarter ways to spot AI-generated content.

Vision and Roadmap

We've outlined our project objectives and end goals in the Vision & Roadmap document.

Miners

For the baseline model we implemented deberta-v3-large model, which was finetuned on 500k of human/ai-generated texts. Overall accuracy on our validation is about 93%.

Validators

For validating we use two types of data, which is balanced in proportion 1:1.

Human-written texts

To gather human-written validation data we use the Pile dataset.

The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together. It includes web-crawled data, financial, med, law, arxiv, github and also about 15 different topics.

AI-generated texts

For AI-generated text collection, we need to obtain prompts and then generate texts based on these prompts. While for human texts we take samples from Pile dataset we have to generate ai-samples from the same data-source, so that the only difference between them was human/ai written.

So, as prompts we take a random sample and then use part of it as text begging and ask LLMs to generate a completion for it.

We use the Ollama GitHub repository to run Large Language Models and generate completions for these prompts. As LLMs we use 15 SOTA models, including llama3, starling-lm:7b-beta, mixtral, command r, mistral, gemma:7b, neural-chat, zephyr:7b-beta and others.

We also randomly select generation parameters for LLM during validation to make the dataset more diverse.

Data augmentation to prevent cheating

To prevent remembering Pile dataset and make it stablier to overfitting we add some augmentation to both ai-generated and human-written texts. First of all we select a random sequence of consecutive sentences from a given text. Then we add in a random place (or two) misspelling (about 10 different char-based augs) or remove a random adjective.

These augmentations don't allow miners to precalculate hashes on Pile dataset and then use them to determine whether this text is present in the human set of data or not.

Reward counting

Based on Detecting LLM-Generated Text in Computing Education article we decided to dived our reward on 3 parts:

F1 score

We decided to use it instead of classic accuracy, because it better represent quality of model especially on binary-classification tasks.

False Positive score

FP_score = 1 - FP / len(samples).

It is usually more important not to mistakenly classify human-written text as AI-generated than the other way around. It is preferable to tolerate a few more instances of student cheating or read some AI-generated emails than to wrongly penalize a real student or miss an important letter.

AP score

AP summarizes a precision-recall curve by calculating the weighted mean of precisions achieved at each threshold. This allows us to evaluate the quality of the model's ranking.

The final reward is the average of these three values.

Releases

No releases published

Packages

No packages published