reproducibilityindex.ai

Understanding and Leveraging Overparameterization in Recursive Value Estimation

Authors: Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically we find that these regularizers dramatically improve the stability of TD and FVI, while allowing RM to match and even sometimes surpass their generalization performance with assured stability.
Researcher Affiliation	Collaboration	1Google 2Department of Computing Science, University of Alberta
Pseudocode	No	The paper provides mathematical update equations for algorithms like RM, TD, and FVI (e.g., Eq. 5, 6, 9), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not include any explicit statement about releasing source code, provide a link to a code repository, or mention code availability in supplementary materials for the described methodology.
Open Datasets	Yes	We consider both discrete and continuous control benchmarks in this analysis. For the discrete action environments, we use DQN (Mnih et al., 2015) as the baseline algorithm to add our regularizers. For continuous control environments, we use QT-Opt (Kalashnikov et al., 2018) as the baseline algorithm... We provide extra experiment results on four Mujoco control problems... Half Cheetah, Hopper, Ant, and Walker2d.
Dataset Splits	No	The paper mentions using a 'fixed offline data set' and 'replay buffer with 10k tuples' but does not specify explicit training/validation/test dataset splits, percentages, or sample counts for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions using DQN and QT-Opt as baseline algorithms, but it does not list any specific software dependencies (e.g., programming languages, libraries, or solvers) with their version numbers required to replicate the experiment.
Experiment Setup	Yes	Appendix B.1 Acrobot: Replay buffer with 10k tuples sampled using a random policy across trajectories with maximum episode length of 64. A DQN with hidden units consisting of fully connected layers with (100, 100) units. Batch size 64. Learning rate of 1e-3. Regularized RM with weight of 2e-2 on Rφ and 1e-4 on Rw. Regularized TD with weight of 0 on Rφ and 1e-4 on Rw.