reproducibilityindex.ai

Dialogue Learning With Human-in-the-Loop

Authors: Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach.
Researcher Affiliation	Industry	Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc Aurelio Ranzato, Jason Weston Facebook AI Research, New York, USA {jiwel,ahm,spchopra,ranzato,jase}@fb.com
Pseudocode	No	The paper describes algorithms (RBI, REINFORCE, FP) but does not present them in a pseudocode block or a clearly labeled algorithm figure.
Open Source Code	Yes	Code and data are available at https://github.com/facebook/Mem NN/tree/master/HITL.
Open Datasets	Yes	Following Weston (2016), we use (i) the single supporting fact problem from the b Ab I datasets (Weston et al., 2015)...; and (ii) the Wiki Movies dataset (Weston et al., 2015)...
Dataset Splits	Yes	We use the same train/valid/test splits. [...] hyperparameters are tuned on a similarly sized validation set.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, memory) are mentioned in the paper.
Software Dependencies	No	The paper mentions 'Mem N2N' model, but it does not specify versions of programming languages, libraries, or frameworks (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x).
Experiment Setup	Yes	In order to make this work in the online setting which requires exploration to ﬁnd the correct answer, we employ an ϵ-greedy strategy: the learner makes a prediction using its own model (the answer assigned the highest probability) with probability 1 ϵ, otherwise it picks a random answer with probability ϵ. [...] We use batch size to refer to how many dialogue episodes the current model is used to collect feedback before updating its parameters.