Learning Novel Policies For Tasks
Authors: Yunbo Zhang, Wenhao Yu, Greg Turk
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this method on maze navigation tasks, a reaching task for a simulated robot arm, and a locomotion task for a hopper. We also demonstrate the effectiveness of our approach on deceptive tasks in which policy gradient methods often get stuck. |
| Researcher Affiliation | Academia | 1School of Interactive Computing, Georgia Institute of Technology, USA. Correspondence to: Yunbo Zhang <yzhang3027@gatech.edu>, Wenhao Yu <wenhaoyu@gatech.edu>, Greg Turk <turk@cc.gatech.edu>. |
| Pseudocode | Yes | Algorithm 1 Task Novelty Policy Learning and Algorithm 2 Task-Novelty Bisector Gradient. |
| Open Source Code | No | The paper mentions building on 'PPO ... implementation in Open AI Baselines' and implementing environments in 'Open AI Gym', which are third-party libraries. It provides a link to videos ('https://sites.google.com/view/learningnovelpolicy/home') but does not state that the source code for their own method is released or available. |
| Open Datasets | No | The paper utilizes simulated environments ('Open AI Gym' for mazes, 'DART physics engine' for robotics) from which data is generated through interaction. It does not use a pre-existing, publicly available dataset with concrete access information for training in the traditional sense. |
| Dataset Splits | No | The paper does not specify exact train/validation/test dataset split percentages, absolute sample counts for each split, or reference predefined splits with citations for data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Baselines', 'Open AI Gym', and 'DART physics engine' with associated citations, but it does not specify explicit version numbers for these software dependencies (e.g., Open AI Gym v0.10.5). |
| Experiment Setup | Yes | Each rollout for each of the environments has a horizon of 500 control steps unless it triggers an early termination criterion. For D-Maze, we run five trials with k = 4 polices in sequence in each trial for each method. Each policy is trained over 3M samples, and that gives a sample budget of 12M for each trial. |