Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches
Authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present fundamental limitations of verifying the semantic properties of LLM outputs and identifying compositional threats, illustrating inherent challenges of current approaches to censoring LLM outputs. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, and semantic properties of LLM outputs can become impossible to verify when the LLM is capable of providing "encrypted" outputs. We further show challenges of censorship can extend beyond just semantic censorship, as attackers can reconstruct impermissible outputs from a collection of permissible ones. Consequently, we call for a reevaluation of the problem of censorship and its goals, stressing the need for new definitions and approaches to censorship. In addition, we provide an initial attempt toward achieving this goal through syntactic censorship, drawing from a security perspective to design censorship methods that can provide guarantees. |
| Researcher Affiliation | Academia | 1University of Toronto & Vector Institute 2University of Oxford. Correspondence to: David Glukhov <david.glukhov@mail.utoront.ca>. |
| Pseudocode | No | The paper describes algorithms in prose but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention providing open-source code for the methodology it describes. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments involving training on datasets that it would need to make publicly available. |
| Dataset Splits | No | The paper is theoretical and does not involve dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify any hardware used for its theoretical analysis or demonstrations. |
| Software Dependencies | No | The paper discusses LLMs like GPT-4-turbo but does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations. |