reproducibilityindex.ai

Self-conditioning Pre-Trained Language Models

Authors: Xavier Suau Cuadros, Luca Zappella, Nicholas Apostoloff

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experimental Analysis
Researcher Affiliation	Industry	Xavier Suau 1 Luca Zappella 1 Nicholas Apostoloff 1 1Apple. Correspondence to: Xavier Suau <xsuaucuadros@apple.com>.
Pseudocode	Yes	A. Pytorch code implementing the do(c, k) intervention. Listing 1: Python code
Open Source Code	Yes	Code available at https://github.com/apple/ml-selfcond
Open Datasets	Yes	We construct our concept dataset leveraging the One Sec dataset (Scarlini et al., 2019)
Dataset Splits	No	The paper uses pre-trained models (GPT2) and a 'concept dataset' to identify expert units, but does not define explicit train/validation/test splits for the method itself, as it does not involve re-training or fine-tuning of the language model.
Hardware Specification	Yes	According to the benchmark in the Transformers repository, the average inference time for GPT2 for sentences of 128 tokens is 16ms on GPU (single V100 GPU, 16GB VRAM) and 67ms on CPU (Intel Xeon @ 2.3GHz CPU with 32 v CPU).
Software Dependencies	No	The paper mentions using PyTorch and Huggingface Transformers, but does not specify their version numbers.
Experiment Setup	Yes	In all our experiments the decoding strategy is by top-n sampling with n = 10 as Yang & Klein (2021). ... The presence of a concept is induced by increasing k from 0 to 300 for our approach, increasing the λ parameter from 1 to 12 for FUDGE, and increasing the stepsize from 0 to 1 for PPLM-Bo W.