Just Add Functions: A Neural-Symbolic Language Model

Authors: David Demeter, Doug Downey7634-7642

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore the effectiveness of this approach on numbers and geographic locations, and show that NSLMs significantly reduce perplexity in small-corpus language modeling, and that the performance improvement persists for rare tokens even on much larger corpora. ... Our primary experimental results are shown in Tables 2 and 3.
Researcher Affiliation Collaboration David Demeter Northwestern University Evanston, IL, USA ddemeter@u.northwestern.edu Doug Downey Allen Institute for Artificial Intelligence Seattle, WA, USA dougd@allenai.org
Pseudocode No Table 1 is titled 'NSLM Construction Algorithm' and lists general steps, but it is not presented in a structured pseudocode format with programming constructs like variables, loops, or conditional logic.
Open Source Code No The paper does not provide any explicit statement about releasing its source code, nor does it include links to a code repository or mention code availability in supplementary materials.
Open Datasets Yes To evaluate numbers on the Wikitext corpora... To evaluate geographic locations on the Wikitext corpora, multi-word named entities appearing in the Geonames data set (Geo Names 2019) are chunked together to form single tokens.
Dataset Splits Yes Step size, ensembling factor λCache and temperature θCache were set to 500, 0.25 and 0.75, respectively, after tuning on the validation set.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or cloud computing instance specifications, beyond general mentions of training models.
Software Dependencies No The paper does not specify version numbers for any software dependencies, libraries, or frameworks used in the implementation or experimentation.
Experiment Setup Yes Thus, we adopt a standard language model architecture as our primary baseline, an RNN with LSTM cells and hyper-parameters corresponding to medium 650 dimensional models (Zaremba, Sutskever, and Vinyals 2014). ... During training, the softmax is computed using the full vocabulary, except for the Wikitext-103 model which uses a sampled-softmax (Jean et al. 2015) with a sampling rate of 2,500. ... Step size, ensembling factor λCache and temperature θCache were set to 500, 0.25 and 0.75, respectively, after tuning on the validation set.