reproducibilityindex.ai

Language Through a Prism: A Spectral Approach for Multiscale Language Representations

Authors: Alex Tamkin, Dan Jurafsky, Noah Goodman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that signal processing provides a natural framework for separating structure across scales, enabling us to 1) disentangle scale-speciﬁc information in existing embeddings and 2) train models to learn more about particular scales. Concretely, we apply spectral ﬁlters to the activations of a neuron across an input, producing ﬁltered embeddings that perform well on part of speech tagging (word-level), dialog speech acts classiﬁcation (utterance-level), or topic classiﬁcation (document-level), while performing poorly on the other tasks. We evaluate the content of these ﬁltered representations through probing experiments [39, 40, 41]. For each dataset below, we encode each training example with a ﬁxed, pretrained BERT-Base cased model [26]. We then apply a spectral ﬁlter along each dimension and train a softmax classiﬁer to perform a particular task using each ﬁltered representation.
Researcher Affiliation	Academia	Alex Tamkin Stanford University Dan Jurafsky Stanford University Noah Goodman Stanford University
Pseudocode	Yes	Figure 3: (b) Spectral ﬁlters are simple to incorporate into existing models. Python-style code for a low-pass ﬁlter over representations.
Open Source Code	No	The paper references an external PyTorch library: "https://github.com/zh217/torch-dct" (footnote 8). However, it does not state that the authors' own implementation code for the methodology described in the paper is open-source or provide a link to it.
Open Datasets	Yes	We use the Penn Treebank dataset [42]. We use the Switchboard Dialog Speech Acts corpus [43, 44, 45]. We use the 20 Newsgroups dataset [46]. We train on the Wiki Text-103 dataset [48] for 50k steps at a batch size of 8 with default parameters for Adam.
Dataset Splits	No	The paper mentions using a validation set for early stopping: "We use early stopping with a patience of one, decaying the learning rate by a factor of 2 when successive epochs do not produce a decrease in validation loss." However, it does not provide specific details about the size or percentage of this validation split for any of the datasets used.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies	No	The paper mentions using "an external Py Torch library for computing and backpropagating through the DCT and IDCT" (footnote 8, referencing https://github.com/zh217/torch-dct). However, it does not specify version numbers for PyTorch or any other key software components used in the experiments.
Experiment Setup	Yes	We train our probing models for a maximum of 30 epochs, using the Adam optimizer [47] with default parameters. We train on the Wiki Text-103 dataset [48] for 50k steps at a batch size of 8 with default parameters for Adam.