[Submitted on 6 Apr 2023 (v1), last revised 18 Apr 2023 (this version, v2)]

Download PDF

Abstract: The rapid adoption of generative language models has brought about
substantial advancements in digital communication, while simultaneously raising
concerns regarding the potential misuse of AI-generated content. Although
numerous detection methods have been proposed to differentiate between AI and
human-generated content, the fairness and robustness of these detectors remain
underexplored. In this study, we evaluate the performance of several
widely-used GPT detectors using writing samples from native and non-native
English writers. Our findings reveal that these detectors consistently
misclassify non-native English writing samples as AI-generated, whereas native
writing samples are accurately identified. Furthermore, we demonstrate that
simple prompting strategies can not only mitigate this bias but also
effectively bypass GPT detectors, suggesting that GPT detectors may
unintentionally penalize writers with constrained linguistic expressions. Our
results call for a broader conversation about the ethical implications of
deploying ChatGPT content detectors and caution against their use in evaluative
or educational settings, particularly when they may inadvertently penalize or
exclude non-native English speakers from the global discourse.

Submission history

From: Weixin Liang [view email]


Thu, 6 Apr 2023 01:51:15 UTC (1,273 KB)


Tue, 18 Apr 2023 22:59:26 UTC (1,564 KB)

Read More