Self-conditioning Pre-Trained Language Models

Authors: Xavier Suau Cuadros, Luca Zappella, Nicholas Apostoloff

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experimental Analysis
Researcher Affiliation Industry Xavier Suau 1 Luca Zappella 1 Nicholas Apostoloff 1 1Apple. Correspondence to: Xavier Suau <xsuaucuadros@apple.com>.
Pseudocode Yes A. Pytorch code implementing the do(c, k) intervention. Listing 1: Python code
Open Source Code Yes Code available at https://github.com/apple/ml-selfcond
Open Datasets Yes We construct our concept dataset leveraging the One Sec dataset (Scarlini et al., 2019)
Dataset Splits No The paper uses pre-trained models (GPT2) and a 'concept dataset' to identify expert units, but does not define explicit train/validation/test splits for the method itself, as it does not involve re-training or fine-tuning of the language model.
Hardware Specification Yes According to the benchmark in the Transformers repository, the average inference time for GPT2 for sentences of 128 tokens is 16ms on GPU (single V100 GPU, 16GB VRAM) and 67ms on CPU (Intel Xeon @ 2.3GHz CPU with 32 v CPU).
Software Dependencies No The paper mentions using PyTorch and Huggingface Transformers, but does not specify their version numbers.
Experiment Setup Yes In all our experiments the decoding strategy is by top-n sampling with n = 10 as Yang & Klein (2021). ... The presence of a concept is induced by increasing k from 0 to 300 for our approach, increasing the λ parameter from 1 to 12 for FUDGE, and increasing the stepsize from 0 to 1 for PPLM-Bo W.