The purpose of DetectGPT is to determine whether a piece of text was generated by a specific llm, such as GPT-3. To classify paragraph x, DetectGPT first generates a small perturbation on paragraph ~xi using a common pre-trained model (e.g., T5). DetectGPT then compares the log probability of the original sample x with each perturbed sample ~xi. If the average log ratio is high, the sample is likely from the source model.
ChatGPT is a hot topic. There is ongoing discussion about whether it is possible to detect that an article was generated by a large language model (LLM). DetectGPT defines a new curvature-based criterion for judging whether to generate from a given LLM. DetectGPT does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking the generated text. It uses only log probabilities computed by the model of interest and article random perturbations from another general-purpose pretrained language model (e.g., T5).
Identifies and utilizes the machine-generated channel x~pθ (left) located in the negative of logp (x) Trend in regions of curvature where nearby samples have lower model log probability on average. In contrast, human-written text x~preal(.) (right) tends not to occupy regions with significant negative log-probability curvature.
DetectGPT is based on the assumption that samples from the source model pθ usually lie in the negative curvature region of the pθ logarithmic probability function, which is different from human text. If we apply a small perturbation to a piece of text x~pθ, yielding ~x, the number of machine-generated samples log pθ(x) - log pθ(~x) should be relatively large compared to human-written text. Using this assumption, first consider a perturbation function q(.|x), which gives a distribution over ~x, a slightly modified version of x with similar meaning (usually consider a rough paragraph-length text x). For example, q(.|x) might be the result of simply asking a human to rewrite one of the sentences for x while preserving the meaning of x. Using the concept of perturbation function, the perturbation difference d (x; pθ, q) can be defined:
## Therefore, the following assumption 4.1 is: If q(.|x) is a sample from a mask-filling model (such as T5) rather than a human rewrite, then Assumption 4.1 can be expressed as An automated, scalable approach to empirical testing. 2. DetectGPT: Automatic test After rewriting an article, the logarithmic probability of the article generated by the model ( The average decrease in perturbation difference) is always higher than that of human-written articlesFor real data, 500 news articles from the XSum dataset were used. Using the output of four different llms when prompted for the first 30 tokens of each article in XSum. Perturbation is applied using T5-3B, masking a randomly sampled span of 2 words until 15% of the words in the article are masked. The expectation in equation (1) above is approximated by 100 samples in T5. The above experimental results show that there is a significant difference in the distribution of perturbation differences between human-written articles and model samples; model samples often have large perturbation differences. Based on these results, it is possible to detect whether a piece of text was generated by model p by simply thresholding the perturbation difference. Normalizing the perturbation differences by the standard deviation of the observations used to estimate E~x q(.|x) log p (~x) provides better detection, typically increasing the AUROC by around 0.020 , so the normalized version of the perturbation difference was used in the experiments. DetectGPT detection process pseudocodePerturbation difference may be useful, what it measures cannot be clearly explained, so the author in the next section Use curvature to explain. 3. Interpret the perturbation difference as curvatureThe perturbation difference approximates a measure of the local curvature of the log probability function near the candidate passage. More specifically, it is consistent with the Hessian of the log probability function. The negative trace of the matrix is proportional. This section has a lot of content, so I won’t explain it in detail here. If you are interested, you can read the original paper, which is roughly summarized as follows:Sampling in semantic space ensures that all samples stay close to the data manifold, since the log probability is expected to always decrease if perturbation markers are added randomly. So the goal can be interpreted as approximately constraining the curvature on the data manifold.
Each experiment uses 150 to 500 examples for evaluation. Machine-generated text is generated by prompting the first 30 tokens of real text. Use AUROC) to evaluate performance.
It can be seen that DetectGPT maximizes the average detection accuracy of XSum stories (AUROC increased by 0.1) and SQuAD Wikipedia context (AUROC increased by 0.05).
For 14 of the 15 dataset and model combinations, DetectGPT provides the most accurate detection performance, with an average improvement in AUROC of 0.06.
# Supervised machine-generated text detection models trained on large datasets of real and generated text are The performance on text within the distribution (top row) is as good as DetectGPT, or even better. The zero-shot method is applied to new domains (bottom row) such as PubMed medical text and German news data in WMT16.
Evaluated on 200 samples from each dataset, the supervised detector performs similarly to DetectGPT on in-distribution data such as English news, but in the case of English scientific writing, its performance is significantly worse than zero sample approach, which completely fails in German writing.
DetectGPT’s average AUROC for GPT-3 is comparable to supervised models trained specifically for machine-generated text detection.
150 examples were extracted from PubMedQA, XSum and writingprompt data sets. Two pre-trained roberta-based detector models are compared with DetectGPT and the probabilistic threshold baseline. DetectGPT can provide detections that compete with more powerful supervised models.
This part is to see if the detector can detect human-edited machine-generated text. Manual revision was simulated by replacing 5 word spans of the text with samples from T5–3B until r% of the text was replaced. DetectGPT maintains detection AUROC above 0.8 even though nearly a quarter of the text in the model sample has been replaced. DetectGPT shows the strongest detection performance across all revision levels.
The above is the detailed content of DetectGPT: Zero-shot machine-generated text detection using probabilistic curvature. For more information, please follow other related articles on the PHP Chinese website!