Value Iteration Networks

Authors: Aviv Tamar, YI WU, Garrett Thomas, Sergey Levine, Pieter Abbeel

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We demonstrate the effectiveness of VINs within standard RL and IL algorithms in various problems, among which require visual perception, continuous control, and also natural language based decision making in the Web Nav challenge [23]. After training, the policy learns to map an observation to a planning computation relevant for the task, and generate action predictions based on the resulting plan. As we demonstrate, this leads to policies that generalize better to new, unseen, task instances.
Researcher Affiliation Academia Dept. of Electrical Engineering and Computer Sciences, UC Berkeley
Pseudocode No The paper includes architectural diagrams (Figure 2) but does not provide explicit pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at https://github.com/avivt/VIN.
Open Datasets No The paper describes experiments on a "synthetic grid-world", "Mars landscape" images, and the "Wikipedia for Schools website". While these domains are mentioned, the paper does not provide explicit links, DOIs, or citations with author/year for public access to the specific datasets used for training or testing.
Dataset Splits No The paper mentions a "held-out test-set" but does not explicitly provide details about a validation set split or its size/proportion.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used for running the experiments. It only mentions using Mujoco for physical simulation.
Software Dependencies No The paper mentions "Theano [28]" but does not specify a version number for Theano or any other software libraries or dependencies. It also mentions "Mujoco [29]" and "publicly available GPS code [7]" but without specific versions for these tools.
Experiment Setup No The paper describes high-level design choices for the VIN (e.g., K recurrence, fR as CNN, attention module, pre-training with discounted grid-world transitions) and compares with other network architectures. However, it does not provide specific numerical hyperparameters (e.g., learning rate, batch size, number of epochs, specific optimizer settings) used during training.