reproducibilityindex.ai

Multi-step Greedy Reinforcement Learning Algorithms

Authors: Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When evaluated on a range of Atari and Mu Jo Co benchmark tasks, our results indicate that for the right range of , our algorithms outperform DQN and TRPO.
Researcher Affiliation	Collaboration	1Facebook AI Research, Menlo Park, USA 2Technion, Haifa, Israel 3Google Research, Mountain View, USA.
Pseudocode	Yes	Algorithm 1 -Policy Iteration; Algorithm 2 -Value Iteration; Algorithm 3 -PI-DQN; Algorithm 4 -PI-TRPO
Open Source Code	No	The paper cites external codebases like Open AI Baselines but does not provide concrete access to its own source code.
Open Datasets	Yes	We choose to test our -DQN and -TRPO algorithms on the Atari and Mu Jo Co benchmarks, respectively.
Dataset Splits	No	The paper describes total sample counts for training and iterations but does not provide specific train/validation/test dataset splits (percentages or counts) in the conventional sense.
Hardware Specification	No	The paper mentions using 'standard setups' but does not provide specific hardware details (e.g., exact GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions optimization algorithms like 'Adam optimizer' and components like 'target Q value networks' but does not list specific software libraries or solvers with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup	Yes	Both of these algorithms use standard setups, including the use of the Adam optimizer for performing gradient descent, a discount factor of 0.99 across all tasks, target Q value networks in the case of -DQN and an entropy regularizer with a coefﬁcient of 0.01 in the case of -TRPO. ... we set the total number of iterations to 2000, with each iteration consisting 1000 samples. ... CF A is set to 0.05 for all our experiments with other Atari domains. ... we set CF A = 0.2 in our experiments with other Mu Jo Co domains.