Sub-Goal Trees a Framework for Goal-Based Reinforcement Learning

Authors: Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we apply our method to neural motion planning, where we demonstrate significant improvements compared to standard RL on navigating a 7-Do F robot arm between obstacles. We compare SGT-PG with a sequential baseline, Sequential sub-goals (Seq SG), which prescribes the sub-goals predictions sequentially.
Researcher Affiliation Collaboration 1EE Department, Technion 2Osaro Inc. Correspondence to: Tom Jurgenson <tomj@campus.technion.ac.il>, Aviv Tamar <avivt@technion.ac.il>.
Pseudocode Yes Algorithm 1 Fitted SGTDP Algorithm, Algorithm 2 SGT-PG Algorithm
Open Source Code Yes We illustrate the SGT value functions and trajectories on a simple 2D point robot domain, which we solve using Fitted SGTDP (Section 6.1, code: https://github.com/tomjur/SGT batch RL .git). We then consider a more challenging domain with a simulated 7Do F robotic arm, and demonstrate the effectiveness of SGT-PG (Section 6.2 code: https://github.com/tomjur/SGT-PG.git).
Open Datasets No To generate data, we sampled states and actions uniformly and independently, resulting in 125K (s, u, c, s ) tuples. For the 7Do F Franka Panda robotic arm, the paper describes simulation and generated motion segments but does not mention a publicly available dataset.
Dataset Splits No The paper mentions generating 125K tuples and choosing 200 random start and goal points for evaluation, and evaluating on 100 random start-goal pairs held out during training, but does not provide specific train/validation/test split percentages or counts for a complete dataset.
Hardware Specification No The paper does not specify any hardware details like GPU/CPU models, memory, or cloud computing resources used for the experiments.
Software Dependencies No The paper mentions 'PPO objective (Schulman et al., 2017)' and 'Neural Motion Planning' which imply software, but no specific version numbers are provided for any libraries or frameworks used (e.g., PyTorch, TensorFlow, etc.).
Experiment Setup Yes As for function approximation, we opted for simplicity, and used K-nearest neighbors (KNN) for all our experiments, with Kneighbors = 5. To solve the minimization over states in Fitted SGTDP, we discretized the state space and searched over a 50 50 grid of points. All other hyper-parameters were specifically tuned for each model.