Sub-Goal Trees a Framework for Goal-Based Reinforcement Learning
Authors: Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we apply our method to neural motion planning, where we demonstrate significant improvements compared to standard RL on navigating a 7-Do F robot arm between obstacles. We compare SGT-PG with a sequential baseline, Sequential sub-goals (Seq SG), which prescribes the sub-goals predictions sequentially. |
| Researcher Affiliation | Collaboration | 1EE Department, Technion 2Osaro Inc. Correspondence to: Tom Jurgenson <tomj@campus.technion.ac.il>, Aviv Tamar <avivt@technion.ac.il>. |
| Pseudocode | Yes | Algorithm 1 Fitted SGTDP Algorithm, Algorithm 2 SGT-PG Algorithm |
| Open Source Code | Yes | We illustrate the SGT value functions and trajectories on a simple 2D point robot domain, which we solve using Fitted SGTDP (Section 6.1, code: https://github.com/tomjur/SGT batch RL .git). We then consider a more challenging domain with a simulated 7Do F robotic arm, and demonstrate the effectiveness of SGT-PG (Section 6.2 code: https://github.com/tomjur/SGT-PG.git). |
| Open Datasets | No | To generate data, we sampled states and actions uniformly and independently, resulting in 125K (s, u, c, s ) tuples. For the 7Do F Franka Panda robotic arm, the paper describes simulation and generated motion segments but does not mention a publicly available dataset. |
| Dataset Splits | No | The paper mentions generating 125K tuples and choosing 200 random start and goal points for evaluation, and evaluating on 100 random start-goal pairs held out during training, but does not provide specific train/validation/test split percentages or counts for a complete dataset. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models, memory, or cloud computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions 'PPO objective (Schulman et al., 2017)' and 'Neural Motion Planning' which imply software, but no specific version numbers are provided for any libraries or frameworks used (e.g., PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | As for function approximation, we opted for simplicity, and used K-nearest neighbors (KNN) for all our experiments, with Kneighbors = 5. To solve the minimization over states in Fitted SGTDP, we discretized the state space and searched over a 50 50 grid of points. All other hyper-parameters were specifically tuned for each model. |