Home > Technology peripherals > AI > Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods

Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods

WBOY
Release: 2023-04-14 14:10:03
forward
1210 people have browsed it

Recent advances in natural language generation (NLG) technology have significantly improved the variety, control, and quality of text generated by large language models. One notable example is OpenAI’s ChatGPT, which has demonstrated superior performance in tasks such as answering questions, writing emails, papers, and code. However, this newfound ability to efficiently generate text also raises concerns about detecting and preventing the misuse of large language models in tasks such as phishing, disinformation, and academic dishonesty. For example, due to concerns about students using ChatGPT to write homework, New York public schools banned the use of ChatGPT, and the media also issued warnings about fake news generated by large language models. These concerns about the misuse of large language models have seriously hindered the application of natural language generation in important fields such as media and education.

There has been increasing discussion recently about whether and how to correctly detect text generated by large language models. This article provides a comprehensive technical introduction to existing detection methods.

Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods

  • Paper address: https://github.com/datamllab/The-Science- of-LLM-generated-Text-Detection
  • Related research address: https://github.com/datamllab/awsome-LLM-generated-text-detection /tree/main

Existing methods can be roughly divided into two categories: Black box detection and white box detection.

Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods

Overview of Large Language Model Generation for Text Detection

  • Black box detection methods usually only have API level access to large language models. Therefore, this type of approach relies on collecting text samples from humans and machines to train classification models;
  • #White-box detection, this type of approach has all access to large language models and can Track and detect generated text by controlling the generation behavior of the model or adding watermarks to the generated text.

In practice, black-box detectors are usually built by third parties, such as GPTZero, while white-box detectors are usually built by large language model developers.

Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods

Large-scale language modelGenerated text detection taxonomy

BlackBoxDetection

Black box detection generally has three steps, namely data collection, feature selection and model establishment.

For the collection of human text, one method is to recruit professionals for data collection, but this method is time-consuming and labor-intensive and is not suitable for the collection of large data sets. A more efficient method is to use Existing human text data, such as collecting entries edited by various experts from Wikipedia, or collecting data from media, such as Reddit.

Feature selection is generally divided into statistical features, linguistic features and factual features. The statistical features are generally used to check whether the text generated by a large language model is different from human text in some commonly used text statistical indicators. Commonly used ones include TFIDF, Zipf's Law, etc. Linguistic features generally look for some linguistic features, such as parts of speech, dependency analysis, sentiment analysis, etc. Finally, large language models often generate counterfactual statements, so fact verification can also provide some information that distinguishes text generated by large language models.

Existing classification models are generally divided into traditional machine learning models, such as SVM. The latest research tends to use language models as the backbone, such as BERT and RoBERTa, and has achieved higher detection performance.

Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods

There are clear differences between the two texts. human-written text from Chalkbeat New York.

White box detection

White box detection generally defaults to detection provided by large language model developers. Different from black-box detection, white-box detection has full access to the model, so it can implant watermarks by changing the output of the model to achieve the purpose of detection.

The current detection methods can be divided into post-hoc watermark and inference time watermark:

  • The post-hoc watermark is to add some hidden information to the text for subsequent detection after the large language model has generated the text;
  • Inference time watermark It is to change the token sampling mechanism of the large language model to add watermarks. In the process of generating each token by the large language model, it will select the next generated word based on the probability of all tokens and the preset sampling strategy. This choice You can add watermark in the process.

Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods

Inference time watermark

The author is worried

(1)For the black box model, data collection is a very critical step, but this process is very easy to introduce biases. For example, existing data sets mainly focus on several tasks such as question and answer and story generation, which introduces subject bias. In addition, text generated by large models often has a fixed style or format. These biases are often used by black-box classifiers as the main features for classification and reduce the robustness of detection.

As the capabilities of large-scale language models improve, the gap between the text generated by large-scale language models and humans will become smaller and smaller, resulting in the detection accuracy of black-box models getting lower and lower. Therefore, white-box detection is a more promising detection method in the future.

(2)Existing detection methods default to large language models owned by the company, so all Users obtain the company's large-scale language model services through APIs. This many-to-one relationship is very useful for the deployment of detection systems. But if the company open sourced a large language model, this would cause almost all existing detection methods to become ineffective.

For black-box detection, because users can fine-tune their models and change the style or format of model output, black-box detection cannot find common detection features.

White box detection may be a solution. Companies can add a watermark to the model before open source. However, users can also fine-tune the model and change the sampling mechanism of the model token to remove the watermark. There is currently no watermarking technology that can protect users from these potential threats.

The above is the detailed content of Pretending to be a human author, ChatGPT and other abuses cause concern, an article summarizes AI-generated text detection methods. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template