The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations

Authors: Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare a range of state-of-the-art models, each with a different way of encoding what has been previously read. We show that models which store explicit representations of long-term contexts outperform state-of-the-art neural language models at predicting semantic content words, although this advantage is not observed for syntactic function words. Interestingly, we find that the amount of text encoded in a single memory representation is highly influential to the performance: there is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful information in a text to be effectively retained and recalled. Further, the attention over such window-based memories can be trained effectively through self-supervision. We then assess the generality of this principle by applying it to the CNN QA benchmark, which involves identifying named entities in paraphrased summaries of news articles, and achieve state-of-the-art performance.
Researcher Affiliation Collaboration Felix Hill , Antoine Bordes, Sumit Chopra & Jason Weston Facebook AI Research 770 Broadway New York, USA felix.hill@cl.cam.ac.uk,{abordes,spchopra,jase}@fb.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No For the lexical memory we use the code available at https://github.com/facebook/Mem NN. (This refers to code for a variant used, not the paper's specific methodology.)
Open Datasets Yes The CBT is built from books that are freely available thanks to Project Gutenberg.1 ... 2The dataset can be downloaded from http://fb.ai/babi/.
Dataset Splits Yes TRAINING VALIDATION TEST NUMBER OF BOOKS 98 5 5 NUMBER OF QUESTIONS (CONTEXT+QUERY) 669,343 8,000 10,000 AVERAGE WORDS IN CONTEXTS 465 435 445 AVERAGE WORDS IN QUERIES 31 27 29 DISTINCT CANDIDATES 37,242 5,485 7,108 VOCABULARY SIZE 53,628
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No All models were implemented using the Torch library (see torch.ch). ... We trained an n-gram language model using the Ken LM toolkit (Heafield et al., 2013). ... (based on output from the POS tagger and named-entity-recogniser in the Stanford Core NLP Toolkit (Manning et al., 2014)). (No version numbers are provided for these software components).
Experiment Setup Yes Optimal hyper-parameter values on CBT: Embedding model (context+query): p = 300, λ = 0.01. ... Mem NNs (window memory + self-sup.): n = all, b = 5, λ = 0.01, p = 300.