Value Refinement Network (VRN)

Authors: Jan Wöhlke, Felix Schmitt, Herke van Hoof

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the VRN on simulated robotic (navigation) tasks and demonstrate that it can successfully refine sub-optimal plans to match the performance of more costly planning in the non-simplified problem.
Researcher Affiliation Collaboration Jan W ohlke1,2 , Felix Schmitt1 , Herke van Hoof3 1Bosch Center for Artificial Intelligence 2Uv A-Bosch Delta Lab, University of Amsterdam 3AMLab, University of Amsterdam
Pseudocode No The paper does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No We plan to release code and supplemental material (appendices) here: https://github.com/boschresearch/Value-Refinement-Network
Open Datasets Yes We chose the 3D (x, y, orientation) stick robot maze navigation task introduced and made available by Chen et al. [2019].
Dataset Splits No The paper mentions '2000 different training and 1000 different test problems (layouts)' for one task, but does not provide specific percentages or counts for training, validation, and test splits across all experiments, nor does it specify cross-validation or other detailed splitting methodologies.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using 'double DQN', 'Adam optimizer', and 'Hindsight Experience Replay (HER)' but does not provide specific version numbers for these algorithms or the underlying software libraries (e.g., PyTorch, TensorFlow) or simulators (MuJoCo, highwayenv).
Experiment Setup Yes The VRN parameters ψ are optimized via double DQN [Van Hasselt et al., 2016] using the Adam optimizer [Kingma and Ba, 2014] with a learning rate of 1 10 4, a batch size of 128, and gradient clipping to [ 1, 1]. The memory capacity is 160000. The target network update frequency is 1000 (5000 for Reacher). The discount factor γ is 0.99 (0.98 for Reacher). We furthermore use Hindsight Experience Replay (HER) [Andrychowicz et al., 2017] with, depending on the environment, the final or the future strategy.