Learning Multi-Level Hierarchies with Hindsight
Authors: Andrew Levy, George Konidaris, Robert Platt, Kate Saenko
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate experimentally in both grid world and simulated robotics domains that our approach can significantly accelerate learning relative to other non-hierarchical and hierarchical methods. |
| Researcher Affiliation | Academia | Andrew Levy Department of Computer Science Brown University Providence, RI, USA andrew_levy2@brown.edu George Konidaris Department of Computer Science Brown University Providence, RI, USA gdk@cs.brown.edu Robert Platt College of Computer and Information Science Northeastern University Boston, MA, USA rplatt@ccs.neu.edu Kate Saenko Department of Computer Science Boston University Boston, MA, USA saenko@bu.edu |
| Pseudocode | Yes | Algorithm 1 Hierarchical Actor-Critic (HAC) ... Algorithm 2 Hierarchical Q-Learning (Hier Q) |
| Open Source Code | Yes | For further detail, see the GitHub repository available at https://github.com/andrew-j-levy/Hierarchical-Actor-Critc-HAC-. |
| Open Datasets | Yes | We evaluated our approach on both grid world tasks and more complex simulated robotics environments. The continuous tasks consisted of the following simulated robotics environments developed in Mu Jo Co (Todorov et al., 2012): (i) inverted pendulum, (ii) UR5 reacher, (iii) ant reacher, and (iv) ant four rooms. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. It mentions using DDPG and HER, which often involve such splits, but does not specify the percentages or counts for the datasets or environments used. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software like 'Q-learning', 'DDPG', 'HER', and 'Mu Jo Co' but does not provide specific version numbers for these or other software libraries/dependencies. |
| Experiment Setup | Yes | DDPG Parameters: Bounded Q-Values: We bound the output of each critic function to the range [ H, 0] using a negative sigmoid function... DDPG Target Networks: ...we removed the target networks used in DDPG... Exploration: Each level uses the following exploration strategy... 20% of actions are sampled uniformly at random... 80% of actions are the sum of actions sampled from the level s policy and Gaussian noise Neural Network Architectures: All actor and critic neural networks had 3 hidden layers, with 64 nodes in each hidden layer. Re LU activation functions were used. HAC Parameters: Maximum horizon of a subgoal, H: 1. For k=3-level agents in Mu Jo Co tasks, H = 10; 2. For k=2-level agents in Mu Jo Co tasks, H was generally in the range [20,30] Subgoal testing rate λ = 0.3 Goal and subgoal achievement thresholds were hand-crafted. |