reproducibilityindex.ai

Value Refinement Network (VRN)

Authors: Jan Wöhlke, Felix Schmitt, Herke van Hoof

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the VRN on simulated robotic (navigation) tasks and demonstrate that it can successfully reﬁne sub-optimal plans to match the performance of more costly planning in the non-simpliﬁed problem.
Researcher Affiliation	Collaboration	Jan W ohlke1,2 , Felix Schmitt1 , Herke van Hoof3 1Bosch Center for Artiﬁcial Intelligence 2Uv A-Bosch Delta Lab, University of Amsterdam 3AMLab, University of Amsterdam
Pseudocode	No	The paper does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	We plan to release code and supplemental material (appendices) here: https://github.com/boschresearch/Value-Reﬁnement-Network
Open Datasets	Yes	We chose the 3D (x, y, orientation) stick robot maze navigation task introduced and made available by Chen et al. [2019].
Dataset Splits	No	The paper mentions '2000 different training and 1000 different test problems (layouts)' for one task, but does not provide specific percentages or counts for training, validation, and test splits across all experiments, nor does it specify cross-validation or other detailed splitting methodologies.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using 'double DQN', 'Adam optimizer', and 'Hindsight Experience Replay (HER)' but does not provide specific version numbers for these algorithms or the underlying software libraries (e.g., PyTorch, TensorFlow) or simulators (MuJoCo, highwayenv).
Experiment Setup	Yes	The VRN parameters ψ are optimized via double DQN [Van Hasselt et al., 2016] using the Adam optimizer [Kingma and Ba, 2014] with a learning rate of 1 10 4, a batch size of 128, and gradient clipping to [ 1, 1]. The memory capacity is 160000. The target network update frequency is 1000 (5000 for Reacher). The discount factor γ is 0.99 (0.98 for Reacher). We furthermore use Hindsight Experience Replay (HER) [Andrychowicz et al., 2017] with, depending on the environment, the ﬁnal or the future strategy.