Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dialogue Learning With Human-in-the-Loop
Authors: Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach. |
| Researcher Affiliation | Industry | Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc Aurelio Ranzato, Jason Weston Facebook AI Research, New York, USA EMAIL |
| Pseudocode | No | The paper describes algorithms (RBI, REINFORCE, FP) but does not present them in a pseudocode block or a clearly labeled algorithm figure. |
| Open Source Code | Yes | Code and data are available at https://github.com/facebook/Mem NN/tree/master/HITL. |
| Open Datasets | Yes | Following Weston (2016), we use (i) the single supporting fact problem from the b Ab I datasets (Weston et al., 2015)...; and (ii) the Wiki Movies dataset (Weston et al., 2015)... |
| Dataset Splits | Yes | We use the same train/valid/test splits. [...] hyperparameters are tuned on a similarly sized validation set. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, memory) are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'Mem N2N' model, but it does not specify versions of programming languages, libraries, or frameworks (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x). |
| Experiment Setup | Yes | In order to make this work in the online setting which requires exploration to ο¬nd the correct answer, we employ an Ο΅-greedy strategy: the learner makes a prediction using its own model (the answer assigned the highest probability) with probability 1 Ο΅, otherwise it picks a random answer with probability Ο΅. [...] We use batch size to refer to how many dialogue episodes the current model is used to collect feedback before updating its parameters. |