Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Authors: Uri Alon, Frank Xu, Junxian He, Sudipta Sengupta, Dan Roth, Graham Neubig
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present RETOMATON retrieval automaton which approximates the datastore search, based on (1) saving pointers between consecutive datastore entries, and (2) clustering of entries into states . This effectively results in a weighted finite automaton built on top of the datastore, instead of representing the datastore as a flat list. The creation of the automaton is unsupervised, and a RETOMATON can be constructed from any text collection: either the original training corpus or from another domain. Traversing this automaton at inference time, in parallel to the LM inference, reduces its perplexity by up to 1.85, or alternatively saves up to 83% of the nearest neighbor searches over k NN-LM (Khandelwal et al., 2020) without hurting perplexity. Our code and trained models are available at https: //github.com/neulab/retomaton . |
| Researcher Affiliation | Collaboration | Uri Alon 1 Frank F. Xu 1 Junxian He 1 Sudipta Sengupta 2 Dan Roth 3 Graham Neubig 1 1Language Technologies Institute, Carnegie Mellon University 2Amazon AWS 3AWS AI Labs {ualon,fangzhex,junxianh,gneubig}@cs.cmu.edu {sudipta,drot}@amazon.com |
| Pseudocode | No | The paper describes the algorithms and processes in text and with diagrams, but does not include formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Our code and trained models are available at https: //github.com/neulab/retomaton . |
| Open Datasets | Yes | Following Khandelwal et al. (2020), we use WIKITEXT-103 (Merity et al., 2017), which is a standard benchmark for autoregressive language modeling, having 103M/250K/250K tokens from Wikipedia in its training/validation/test sets, respectively. |
| Dataset Splits | Yes | Following Khandelwal et al. (2020), we use WIKITEXT-103 (Merity et al., 2017), which is a standard benchmark for autoregressive language modeling, having 103M/250K/250K tokens from Wikipedia in its training/validation/test sets, respectively. |
| Hardware Specification | Yes | We ran all experiments on 32 CPU cores, and RTX 3090 or v100 GPUs. |
| Software Dependencies | No | We base our experiments on the original k NN-LM implementation that uses the FAISS (Johnson et al., 2019) library to perform k NN search. We also use FAISS for the one-time k-means clustering. |
| Experiment Setup | Yes | Hyperparameters We used the same settings as the baseline implementations without any special tuning of our model, and always matched the settings to conduct a fair evaluation. We saved half precision (fp16) datastore keys as He et al. (2021). For WIKITEXT-103, which creates a datastore of 103M entries, we use k-means clustering with kclust=1M. For Law-MT, which creates a datastore of 19M entries, we use kclust=200K, which maintains an average cluster size of 100 in both datasets. |