reproducibilityindex.ai

Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

Authors: Uri Alon, Frank Xu, Junxian He, Sudipta Sengupta, Dan Roth, Graham Neubig

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present RETOMATON retrieval automaton which approximates the datastore search, based on (1) saving pointers between consecutive datastore entries, and (2) clustering of entries into states . This effectively results in a weighted ﬁnite automaton built on top of the datastore, instead of representing the datastore as a ﬂat list. The creation of the automaton is unsupervised, and a RETOMATON can be constructed from any text collection: either the original training corpus or from another domain. Traversing this automaton at inference time, in parallel to the LM inference, reduces its perplexity by up to 1.85, or alternatively saves up to 83% of the nearest neighbor searches over k NN-LM (Khandelwal et al., 2020) without hurting perplexity. Our code and trained models are available at https: //github.com/neulab/retomaton .
Researcher Affiliation	Collaboration	Uri Alon 1 Frank F. Xu 1 Junxian He 1 Sudipta Sengupta 2 Dan Roth 3 Graham Neubig 1 1Language Technologies Institute, Carnegie Mellon University 2Amazon AWS 3AWS AI Labs {ualon,fangzhex,junxianh,gneubig}@cs.cmu.edu {sudipta,drot}@amazon.com
Pseudocode	No	The paper describes the algorithms and processes in text and with diagrams, but does not include formal pseudocode blocks or algorithm listings.
Open Source Code	Yes	Our code and trained models are available at https: //github.com/neulab/retomaton .
Open Datasets	Yes	Following Khandelwal et al. (2020), we use WIKITEXT-103 (Merity et al., 2017), which is a standard benchmark for autoregressive language modeling, having 103M/250K/250K tokens from Wikipedia in its training/validation/test sets, respectively.
Dataset Splits	Yes	Following Khandelwal et al. (2020), we use WIKITEXT-103 (Merity et al., 2017), which is a standard benchmark for autoregressive language modeling, having 103M/250K/250K tokens from Wikipedia in its training/validation/test sets, respectively.
Hardware Specification	Yes	We ran all experiments on 32 CPU cores, and RTX 3090 or v100 GPUs.
Software Dependencies	No	We base our experiments on the original k NN-LM implementation that uses the FAISS (Johnson et al., 2019) library to perform k NN search. We also use FAISS for the one-time k-means clustering.
Experiment Setup	Yes	Hyperparameters We used the same settings as the baseline implementations without any special tuning of our model, and always matched the settings to conduct a fair evaluation. We saved half precision (fp16) datastore keys as He et al. (2021). For WIKITEXT-103, which creates a datastore of 103M entries, we use k-means clustering with kclust=1M. For Law-MT, which creates a datastore of 19M entries, we use kclust=200K, which maintains an average cluster size of 100 in both datasets.