reproducibilityindex.ai

Mapping the Timescale Organization of Neural Language Models

Authors: Hsiang-Yun Sherry Chien, Jinhan Zhang, Christopher Honey

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Therefore, we applied tools developed in neuroscience to map the processing timescales of individual units within a word-level LSTM language model. This timescale-mapping method assigned long timescales to units previously found to track long-range syntactic dependencies. Additionally, the mapping revealed a small subset of the network (less than 15% of units) with long timescales and whose function had not previously been explored. We next probed the functional organization of the network by examining the relationship between the processing timescale of units and their network connectivity. We identiﬁed two classes of long-timescale units: controller units composed a densely interconnected subnetwork and strongly projected to the rest of the network, while integrator units showed the longest timescales in the network, and expressed projection proﬁles closer to the mean projection proﬁle. Ablating integrator and controller units affected model performance at different positions within a sentence, suggesting distinctive functions of these two sets of units. Finally, we tested the generalization of these results to a character-level LSTM model and models with different architectures. In summary, we demonstrated a model-free technique for mapping the timescale organization in recurrent neural networks, and we applied this method to reveal the timescale and functional organization of neural language models.
Researcher Affiliation	Academia	Hsiang-Yun Sherry Chien, Jinhan Zhang & Christopher. J. Honey Department of Psychological and Brain Sciences Johns Hopkins University Baltimore, MD 21218, USA {sherry.chien,jzhan205,chris.honey}@jhu.edu
Pseudocode	No	The paper only describes steps in regular paragraph text without structured formatting.
Open Source Code	Yes	The code and dataset to reproduce the experiment can be found at https://github.com/ sherrychien/LSTM_timescales
Open Datasets	Yes	We evaluated the internal representations generated by a pre-trained word-level LSTM language model (WLSTM, Gulordava et al., 2018) as well as a pre-trained character-level LSTM model (CLSTM, Hahn & Baroni, 2019) as they processed sentences sampled from the 427804-word (1965719-character) novel corpus: Anna Karenina by Leo Tolstoy (Tolstoy, 2016), translated from Russian to English by Constance Garnett.
Dataset Splits	No	The paper uses pre-trained models and evaluates on specified corpora and test sets, but it does not provide explicit training/validation/test dataset splits (percentages or counts) for the data used in their experiments, nor does it specify how the 'Anna Karenina' corpus was split by them for their specific analysis.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	As far as possible, we applied similar parameters in the GRU as were used for the LSTM by Gulordava et al. (2018): the same Wikipedia training corpus, the same loss function (i.e. cross-entropy loss), and the same hyperparameters except for a learning rate initialized to 0.1, which we found more optimal to train the GRU. The GRU model also had two layers, with 650 hidden units in each layer.