Capturing Structural Locality in Non-parametric Language Models

Authors: Frank F. Xu, Junxian He, Graham Neubig, Vincent Josua Hellendoorn

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two different domains, Java source code and Wikipedia text, demonstrate that locality features improve model efficacy over models without access to these features, with interesting differences. We also perform an analysis of how and where locality features contribute to improved performance and why the traditionally used contextual similarity metrics alone are not enough to grasp the locality structure.
Researcher Affiliation Academia School of Computer Science Carnegie Mellon University {fangzhex,junxianh,gneubig}@cs.cmu.edu, vhellendoorn@cmu.edu
Pseudocode No No pseudocode or algorithm blocks are explicitly labeled or presented in a structured format.
Open Source Code Yes The source code package containing a README document on how to reproduce the results and analysis and experiment scripts is available in the paper s supplementary material.
Open Datasets Yes WIKITEXT-103 is a standard language modeling benchmark (Merity et al., 2016) consisting of natural language text from English Wikipedia. It contains a 250K token, word-level vocabulary, with 103M tokens in the training set and 250K tokens in both the validation and test sets. [...] https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/. JAVA GITHUB is a programming language corpus containing Java source code from Github (Allamanis & Sutton, 2013) that is widely used in source code modeling (Hellendoorn & Devanbu, 2017; Karampatsis et al., 2020). [...] https://zenodo.org/record/3628665.
Dataset Splits Yes WIKITEXT-103 [...] with 103M tokens in the training set and 250K tokens in both the validation and test sets. [...] JAVA GITHUB [...] It contains 1.44B tokens from 13,362 projects in the training split, 3.83M tokens from 36 projects in the validation split and 5.33M tokens from 38 projects in the test split. The splits are separated by whole projects.
Hardware Specification Yes All experiments are conducted on a single machine with a 48 core CPU and 8 NVIDIA V100 32GB GPU.
Software Dependencies No The paper mentions using a pre-trained model ('For WIKITEXT-103 we use the pretrained model provided by (Khandelwal et al., 2020)'), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes In our experiments, we follow Khandelwal et al. (2020) in setting the interpolation factor λ to 0.25. [...] To optimize the parameters, we use the Adam (Kingma & Ba, 2014) optimizer with a learning rate of 0.0001 on the validation set for 200 epochs." and "train an LM with the exact architecture and optimization described by Baevski & Auli (2018): a decoder-only Transformer (Vaswani et al., 2017), with 1024 dimensional hidden states for the WIKITEXT-103 dataset and 512 for JAVA GITHUB.