Quantifying and Analyzing Entity-Level Memorization in Large Language Models
Authors: Zhenhong Zhou, Jiuyang Xiang, Chaomeng Chen, Sen Su
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments based on the proposed, probing language models ability to reconstruct sensitive entities under different settings. |
| Researcher Affiliation | Academia | Zhenhong Zhou1, Jiuyang Xiang2, Chaomeng Chen1, Sen Su1* 1Beijing University of Posts and Telecommunications 2University of Michigan |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The GPT-Neo model family (Black et al. 2021) includes a set of causal language models (CLMs) that are trained on The Pile datasets (Gao et al. 2020) and available in four sizes... The Enron email dataset (Klimt and Yang 2004), a subset of The Pile dataset, encompasses over 500,000 emails from approximately 150 users of the Enron Corporation. |
| Dataset Splits | No | The paper describes the datasets used (The Pile, Enron) and how entities were extracted and processed, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits used for reproducibility. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments (e.g., CPU, GPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using Python for implementation (implied by typical ML research) but does not provide specific version numbers for Python, PyTorch, TensorFlow, or any other libraries, frameworks, or solvers used. |
| Experiment Setup | Yes | In our experimental setup, we use the greedy decoding strategy by default to generate the output with the minimum perplexity (PPL), which is then utilized for evaluating the model s entity memorization capabilities... The prefix length is a crucial parameter of soft prompts. |