Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Authors: Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Sergey Nikolenko, Evgeny Burnaev, Serguei Barannikov, Irina Piontkovskaya

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments
Researcher Affiliation Collaboration 1Skolkovo Institute of Science and Technology, Russia; 2AI Foundation and Algorithm Lab, Russia; 3Artificial Intelligence Research Institute (AIRI), Russia;4CNRS, Université Paris Cité, France; 5St. Petersburg Department of the Steklov Institute of Mathematics, Russia
Pseudocode Yes B Algorithm for computing the PHD
Open Source Code Yes We release code and data1 github.com/Ar Gintum/GPTID
Open Datasets Yes Our main dataset of human texts is Wiki40b [Guo et al., 2020].
Dataset Splits Yes We split data into train / validation / test sets in proportion 80%/10%/10%.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments are provided in the paper.
Software Dependencies No The paper mentions using 'scikitdimensions library [Bac et al., 2021]' but does not provide a specific version number. Other software mentioned are models (RoBERTa-base, XLM-R) or general tools without versioning.
Experiment Setup Yes We consider consistent text samples of medium size, with length 300 tokens; ... In our experiments, we use Ro BERTa-base [Liu et al., 2019] for English and XLM-R [Goyal et al., 2021] for other languages. ... Finally, we construct a simple single-feature classifier for artificial text detection with PHD as the feature, training a logistic regression on some dataset of real and generated texts. ... We split data into train / validation / test sets in proportion 80%/10%/10%. ... Appendix B (Algorithm for computing the PHD) states: 'k = 8 is a good trade-off between speed of computation and variance of PHD estimation for our data (our sets of points vary between 50 and 510 in size). As for ˆn, we always used ˆn = 40. ... For all our experiments we took J = 7.'