reproducibilityindex.ai

Learning Novel Policies For Tasks

Authors: Yunbo Zhang, Wenhao Yu, Greg Turk

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this method on maze navigation tasks, a reaching task for a simulated robot arm, and a locomotion task for a hopper. We also demonstrate the effectiveness of our approach on deceptive tasks in which policy gradient methods often get stuck.
Researcher Affiliation	Academia	1School of Interactive Computing, Georgia Institute of Technology, USA. Correspondence to: Yunbo Zhang <yzhang3027@gatech.edu>, Wenhao Yu <wenhaoyu@gatech.edu>, Greg Turk <turk@cc.gatech.edu>.
Pseudocode	Yes	Algorithm 1 Task Novelty Policy Learning and Algorithm 2 Task-Novelty Bisector Gradient.
Open Source Code	No	The paper mentions building on 'PPO ... implementation in Open AI Baselines' and implementing environments in 'Open AI Gym', which are third-party libraries. It provides a link to videos ('https://sites.google.com/view/learningnovelpolicy/home') but does not state that the source code for their own method is released or available.
Open Datasets	No	The paper utilizes simulated environments ('Open AI Gym' for mazes, 'DART physics engine' for robotics) from which data is generated through interaction. It does not use a pre-existing, publicly available dataset with concrete access information for training in the traditional sense.
Dataset Splits	No	The paper does not specify exact train/validation/test dataset split percentages, absolute sample counts for each split, or reference predefined splits with citations for data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions 'Open AI Baselines', 'Open AI Gym', and 'DART physics engine' with associated citations, but it does not specify explicit version numbers for these software dependencies (e.g., Open AI Gym v0.10.5).
Experiment Setup	Yes	Each rollout for each of the environments has a horizon of 500 control steps unless it triggers an early termination criterion. For D-Maze, we run ﬁve trials with k = 4 polices in sequence in each trial for each method. Each policy is trained over 3M samples, and that gives a sample budget of 12M for each trial.