reproducibilityindex.ai

Residual Loss Prediction: Reinforcement Learning With No Incremental Feedback

Authors: Hal Daumé III, John Langford, Amr Sharaf

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show the efficacy of RESLOPE on four benchmark reinforcement problems and three bandit structured prediction problems (5.1), comparing to several reinforcement learning algorithms: Reinforce, Proximal Policy Optimization and Advantage Actor-Critic.
Researcher Affiliation	Collaboration	Hal Daum e III University of Maryland & Microsoft Research NYC me@hal3.name John Langford Microsoft Research NYC jcl@microsoft.com Amr Sharaf University of Maryland amr@cs.umd.edu
Pseudocode	Yes	Algorithm 1 RESIDUAL LOSS PREDICTION (RESLOPE) with single deviations
Open Source Code	Yes	The code is available at https://github.com/hal3/macarico,https://github.com/ hal3/reslope
Open Datasets	Yes	We perform experiments on the three tasks described in detail in Appendix G: English Part of Speech Tagging, English Dependency Parsing and Chinese Part of Speech Tagging. ... English POS Tagging we conduct POS tagging experiments over the 45 Penn Treebank (Marcus et al., 1993) tags.
Dataset Splits	Yes	We measure performance in terms of average cumulative loss on the online examples as well as on a held-out evaluation dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models) used for the experiments.
Software Dependencies	No	We implement our models on top of the DyNet neural network optimization package (Neubig et al., 2017). ... We optimize all parameters of the model using the Adam9 optimizer (Kingma & Ba, 2014)... The paper mentions software packages like DyNet and Adam but does not provide specific version numbers for them.
Experiment Setup	Yes	We optimize all parameters of the model using the Adam9 optimizer (Kingma & Ba, 2014), with a tuned learning rate, a moving average rate for the mean of β1 = 0.9 and for the variance of β2 = 0.999; epsilon (for numerical stability) is ﬁxed at 1e 8 (these are the DyNet defaults). The learning rate is tuned in the range {0.050.01, 0.005, 0.001, 0.0005, 0.0001}. For the structured prediction experiments, the following input features hyperparameters are tuned: Word embedding dimension {50, 100, 200, 300}, Bi LSTM dimension {50, 150, 300}, Number of Bi LSTM layers {1, 2}, Policy RNN dimension {50, 150, 300}, Number of policy layers {1, 2}, Roll-out probability β {0.0, 0.5, 1.0}.