DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Eric Mitchell, Yoonho Lee, Chelsea Finn, etc.
26 Jan 2023
Large Language Models
GPT3, ChatGPT, PaLM are the large language models (LLM) of worldwide fame which became the deepest concern. These LLMs have largely enhanced the quality of natural language processing, however, its application have brought out some problems in various perspective such as difficulties in student assessment, impairment on student learning, and proliferation on inaccurate new articles. As LLMs thrive its way out, need for the distribution between human-written text and machine-generated text is becoming bigger.
DetectGPT, The Zero-Shot Machine-Generated Text Detection
In order to ravel out the problems of LLMs, the paper suggests the method "DetectGPT", which detects the machine-generated text using the probability curvature. Unlike the predominant prior works of distributing candidate sample that uses supervised models, DetectGPT detects the sample in a zero shot manner. DetectGPT outperforms all zero shot based detection algorithms and stays highly competitive against supervised models, especially in 'typical texts' such as news articles.
In addition to a decent performance compared to the past works, this paper suggests that DetectGPT generalize much easily to new languages and models in contrast to stronger supervised models. This is because zero-shot methods are not affected by over-fitting while supervised models are. In conclusion, DetectGPT is competitive to state-of-the-art detection models on in-distribution data, and outperforms the supervised models on various, or new, training data.
> Zero-Shot
In this paper, the term "zero-shot" means that while calculating the perturbation discrepancy (which means detecting the machine-written text), human-written samples or machine generated samples are not being reference.
DetectGPT is a zero-shot method in a white box setting which does not access to either human-written or machine-generated passages, also to the model architecture and parameters. The only information DetectGPT could acquire is the log probability of sample (which from now we call it p_theta).
This distribution process of DetectGPT is based on the hypothesis that samples from source model p_theta (which is machine-generated) lies in the area of negative curvature of the log probability function p_theta. To estimate whether a candidate passage satisfies such curvature-base criterion, this paper defines perturbation discrepancy (which from now we call it d(x, p, q)) and proves that the discrepancy is proportional to the measure of the local curvature of the log probability function. Hence the hypothesis can be rewritten as:
Hypothesis
If q produces samples on the data manifold, d(x, p, q) is positive with high probability for samples x ~ p_theta. For human-written text, d(x, p, q) tends toward zero for all x.
Based on the hypothesis, the paper suggests three major steps DetectGPT executes: (1) perturb the candidate sentence with pre-trained mask-filling models such as T5 (2) score and calculate perturbation discrepancy based on the log probability (3) compare with the threshold. Lastly, the paper empirically verifies the hypothesis and found out that it holds true across a diverse body of LLMs.
> Perturbation (in ML)
Adding noise, usually to the training data but sometimes to the learnt parameters. The paper adds noise (perturb) to the candidate passage in order to detect whether the passage is machine-generated or not. i.e. T5 is used as a reasonable perturbation function for english in this paper.
> Data Manifold
Given a high-dimensional data forming a subspace, data manifold is either lowering its dimension or transforming it into latent space which expresses features better.
A Social Offender, ChatGPT
These days, social media and news articles inform us about the concern that citizens have against ChatGPT. ChatGPT is known as a program which replies to every question users ask, and many fear about others abusing this program such as writing essays or reports on behalf of students, or solving various problems which hand-written techniques should be assessed.
The point is, is the term abuse a proper way too express this program? For example, few years ago, the almighty program which provides perfect solution to all mathematical problem called Wolfram Alpha emerged. A lot of people related with math felt apprehensive because they could lose their job, and furthermore the math itself could lose its beauty. However, Wolfram Alpha have amplified the quality of math, and extended various dimensions of math. Math is a language which interprets and describes the detail of science. Similarly, natural language is a language of human being. ChatGPT is a technique which we must make use of, not to fear. Like Wolfram Alpha, ChatGPT could widen the beaut of communication between human being. Of course human being need some time for adaption, but making use of ChatGPT should not be misunderstood as an abusing.
'Task Specific Research' 카테고리의 다른 글
KMMLU: A Korean Benchmark for LLMs (0) | 2024.03.29 |
---|---|
LUKE: Language Understanding with Knowledge-Based Embeddings (0) | 2023.02.04 |
A Paradigm Shift for Non-english(Korean) Language Processing (0) | 2023.01.17 |