Self-conditioning Pre-Trained Language Models
Authors: Xavier Suau Cuadros, Luca Zappella, Nicholas Apostoloff
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experimental Analysis |
| Researcher Affiliation | Industry | Xavier Suau 1 Luca Zappella 1 Nicholas Apostoloff 1 1Apple. Correspondence to: Xavier Suau <xsuaucuadros@apple.com>. |
| Pseudocode | Yes | A. Pytorch code implementing the do(c, k) intervention. Listing 1: Python code |
| Open Source Code | Yes | Code available at https://github.com/apple/ml-selfcond |
| Open Datasets | Yes | We construct our concept dataset leveraging the One Sec dataset (Scarlini et al., 2019) |
| Dataset Splits | No | The paper uses pre-trained models (GPT2) and a 'concept dataset' to identify expert units, but does not define explicit train/validation/test splits for the method itself, as it does not involve re-training or fine-tuning of the language model. |
| Hardware Specification | Yes | According to the benchmark in the Transformers repository, the average inference time for GPT2 for sentences of 128 tokens is 16ms on GPU (single V100 GPU, 16GB VRAM) and 67ms on CPU (Intel Xeon @ 2.3GHz CPU with 32 v CPU). |
| Software Dependencies | No | The paper mentions using PyTorch and Huggingface Transformers, but does not specify their version numbers. |
| Experiment Setup | Yes | In all our experiments the decoding strategy is by top-n sampling with n = 10 as Yang & Klein (2021). ... The presence of a concept is induced by increasing k from 0 to 300 for our approach, increasing the λ parameter from 1 to 12 for FUDGE, and increasing the stepsize from 0 to 1 for PPLM-Bo W. |