reproducibilityindex.ai

Sub-Goal Trees a Framework for Goal-Based Reinforcement Learning

Authors: Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we apply our method to neural motion planning, where we demonstrate signiﬁcant improvements compared to standard RL on navigating a 7-Do F robot arm between obstacles. We compare SGT-PG with a sequential baseline, Sequential sub-goals (Seq SG), which prescribes the sub-goals predictions sequentially.
Researcher Affiliation	Collaboration	1EE Department, Technion 2Osaro Inc. Correspondence to: Tom Jurgenson <tomj@campus.technion.ac.il>, Aviv Tamar <avivt@technion.ac.il>.
Pseudocode	Yes	Algorithm 1 Fitted SGTDP Algorithm, Algorithm 2 SGT-PG Algorithm
Open Source Code	Yes	We illustrate the SGT value functions and trajectories on a simple 2D point robot domain, which we solve using Fitted SGTDP (Section 6.1, code: https://github.com/tomjur/SGT batch RL .git). We then consider a more challenging domain with a simulated 7Do F robotic arm, and demonstrate the effectiveness of SGT-PG (Section 6.2 code: https://github.com/tomjur/SGT-PG.git).
Open Datasets	No	To generate data, we sampled states and actions uniformly and independently, resulting in 125K (s, u, c, s ) tuples. For the 7Do F Franka Panda robotic arm, the paper describes simulation and generated motion segments but does not mention a publicly available dataset.
Dataset Splits	No	The paper mentions generating 125K tuples and choosing 200 random start and goal points for evaluation, and evaluating on 100 random start-goal pairs held out during training, but does not provide specific train/validation/test split percentages or counts for a complete dataset.
Hardware Specification	No	The paper does not specify any hardware details like GPU/CPU models, memory, or cloud computing resources used for the experiments.
Software Dependencies	No	The paper mentions 'PPO objective (Schulman et al., 2017)' and 'Neural Motion Planning' which imply software, but no specific version numbers are provided for any libraries or frameworks used (e.g., PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	As for function approximation, we opted for simplicity, and used K-nearest neighbors (KNN) for all our experiments, with Kneighbors = 5. To solve the minimization over states in Fitted SGTDP, we discretized the state space and searched over a 50 50 grid of points. All other hyper-parameters were speciﬁcally tuned for each model.