Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Better Language Model Inversion by Compactly Representing Next-Token Distributions
Authors: Murtaza Nazir, Matthew Finlayson, John Morris, Xiang Ren, Swabha Swayamdipta
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental setup: We generally follow the experimental settings originally proposed for L2T and O2P for fair comparisons [21, 34]. We initialize our inverter as a pre-trained T5-base model [25]. For our target models, we use variants of Llama 2 7B (for comparison with baselines) and Llama 3.1 8B. 6 Experiments: Table 1 compares the in-distribution performance of PILS with baselines, reporting both the mean and the standard error of the mean for each metric on 2M Instructions. PILS surpasses all previous methods on every metric by a considerable margin. Notably, we achieve 51% exact match recovery of hidden prompts for Llama 2 Chat, where the best previous method (L2T) could only recover 23% exactly. |
| Researcher Affiliation | Academia | Murtaza Nazir Matthew Finlayson University of Southern California John X. Morris Cornell University Xiang Ren University of Southern California Swabha Swayamdipta University of Southern California Correspondence to EMAIL and EMAIL |
| Pseudocode | No | The paper describes the methods in prose, such as in Section 4 "Language model inversion from compressed logprobs", but does not present any explicitly labeled pseudocode or algorithm blocks. The architectural overview in Figure 1 also does not constitute pseudocode. |
| Open Source Code | Yes | Our code is available at https://github.com/Dill-Lab/PILS. |
| Open Datasets | Yes | For training, we use the 2M Instructions dataset [21] as hidden prompts to our target model. We evaluate our inverters on a held-out set from 2M Instructions and two out-of-distribution (ood) test sets: Alpaca Code [6] and Anthropic Helpful/Harmless (HH) [4, 11]. We also report system prompt inversion on Awesome GPT Prompts [2], and GPT Store [18]. |
| Dataset Splits | Yes | For the Awesome dataset, fine-tuning used a learning rate of 1e-4 for 100 epochs, while the Store dataset used a learning rate of 5e-5 for 50 epochs. The T5-base inverter was subsequently fine-tuned for system prompt inversion using the Awesome (50 training/103 testing samples) and Store (50 training/29 testing samples) datasets from Zhang et al. [34]. |
| Hardware Specification | Yes | Main inversion training was conducted on four nvidia rtx a6000 gpus, which takes about 1 week to complete. System prompt inverter fine-tuning utilized a single nvidia rtx a6000 gpu and takes about 10 hours to complete. |
| Software Dependencies | No | Section D, "Implementation details", states "All work utilized Py Torch and Hugging Face transformers." However, it does not specify any version numbers for these software components. |
| Experiment Setup | Yes | D.1 Main inverter training: Key training parameters included a learning rate of 2e-4, a batch size of 250, and the Adam W optimizer with default settings. A 3200-step linear warmup was used, after which the learning rate remained constant. Training ran for 100 epochs (Llama-3.1-8B-Instruct was trained for 50 epochs), using bfloat16 mixed precision. D.2 System prompt inverter fine-tuning: Common fine-tuning parameters across both datasets included a batch size of 50, the Adam W optimizer with default settings, and bfloat16 precision. For the Awesome dataset, fine-tuning used a learning rate of 1e-4 for 100 epochs, while the Store dataset used a learning rate of 5e-5 for 50 epochs. |