reproducibilityindex.ai

Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches

Authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present fundamental limitations of verifying the semantic properties of LLM outputs and identifying compositional threats, illustrating inherent challenges of current approaches to censoring LLM outputs. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, and semantic properties of LLM outputs can become impossible to verify when the LLM is capable of providing "encrypted" outputs. We further show challenges of censorship can extend beyond just semantic censorship, as attackers can reconstruct impermissible outputs from a collection of permissible ones. Consequently, we call for a reevaluation of the problem of censorship and its goals, stressing the need for new definitions and approaches to censorship. In addition, we provide an initial attempt toward achieving this goal through syntactic censorship, drawing from a security perspective to design censorship methods that can provide guarantees.
Researcher Affiliation	Academia	1University of Toronto & Vector Institute 2University of Oxford. Correspondence to: David Glukhov <david.glukhov@mail.utoront.ca>.
Pseudocode	No	The paper describes algorithms in prose but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not mention providing open-source code for the methodology it describes.
Open Datasets	No	The paper is theoretical and does not conduct experiments involving training on datasets that it would need to make publicly available.
Dataset Splits	No	The paper is theoretical and does not involve dataset splits for training, validation, or testing.
Hardware Specification	No	The paper does not specify any hardware used for its theoretical analysis or demonstrations.
Software Dependencies	No	The paper discusses LLMs like GPT-4-turbo but does not list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.