reproducibilityindex.ai

Connecting Pre-trained Language Model and Downstream Task via Properties of Representation

Authors: Chenwei Wu, Holden Lee, Rong Ge

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose and empirically validate the existence of an anchor vector in the representation space, and show that this assumption, together with properties of the downstream task, guarantees performance transfer. ... In Sections 4.2 and F, this is not true for recent large-scale pre-trained language models. ... Figure 1a plots the mean squared approximation error of the log bulk partition function. ... More experiments and discussions are provided in Section F.
Researcher Affiliation	Academia	Chenwei Wu Duke University cwwu@cs.duke.edu Holden Lee Johns Hopkins University hlee283@jhu.edu Rong Ge Duke University rongge@cs.duke.edu
Pseudocode	No	The paper includes mathematical derivations and proofs (e.g., Theorem 1, Theorem 2, Lemma 1), but it does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about releasing open-source code or provide links to a code repository.
Open Datasets	Yes	We use Wiki Text-2 Merity et al. [2016] as the text corpus.
Dataset Splits	No	The paper mentions using 'the first 1/4 of Wiki Text-2 [Merity et al., 2016] as the input text' and calculates perplexities, but it does not specify explicit train/validation/test splits for reproducibility.
Hardware Specification	No	The paper mentions using 'large-scale language models' like GPT-2 and OPT, but it does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'GPT-2 [Radford et al., 2019]' and 'OPT [Zhang et al., 2022]' models. However, it does not specify any programming languages, libraries, or software dependencies with version numbers used for the experimental setup.
Experiment Setup	Yes	The hidden representations we use in this experiment are the last hidden states of these models, i.e., the output of the penultimate layer. The dimension of the hidden representations ranges from 768 to 2048, and the number of tokens is about 70k. We choose the bulk words to be all the words except those having top-k probabilities and compute the optimal anchor vector using the closed-form least-squares solution. In our experiments, we use the mean squared error (MSE) to measure the approximation quality.