Breadth-First Exploration on Adaptive Grid for Reinforcement Learning

Authors: Youngsik Yoon, Gangbok Lee, Sungsoo Ahn, Jungseul Ok

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted a theoretical analysis and demonstrated the effectiveness of our approach through empirical evidence, showing that only BEAG succeeds in complex environments under the proposed fixed-goal setting.
Researcher Affiliation Academia 1Department of CSE, POSTECH, Pohang, Republic of Korea 2Graduate School of AI, POSTECH, Pohang, Republic of Korea. Correspondence to: Jungseul Ok <jungseul@postech.ac.kr>
Pseudocode Yes For other details, we provide the pseudo-codes of BEAG in Appendix A and our implementation in the supplementary material. Appendix A contains Algorithm 1 Overview of BEAG, Algorithm 2 Training policy, Algorithm 3 Remove Subgoal, Algorithm 4 Grid Graph Construction, Algorithm 5 Find Path, Algorithm 6 Adaptive Grid Refinement.
Open Source Code Yes 1https://github.com/ml-postech/BEAG
Open Datasets No The paper states that experiments were conducted on 'Mu Jo Co environments (Ant Maze and Reacher)'. While these are well-known environments, the paper does not provide a specific URL, DOI, repository, or formal citation for a publicly available *dataset* derived from or for these environments that was used for training. The mazes are custom configurations within the MuJoCo engine, not pre-collected datasets with public access information.
Dataset Splits No The paper does not explicitly specify dataset splits for training, validation, and testing (e.g., percentages or sample counts). It discusses training and evaluation periods, and 'fixed goal setting' or 'random goal setting' for evaluation, but not in terms of traditional data splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, memory, or specific computing infrastructure used for running the experiments. It only mentions 'Mu Jo Co environments', which refers to software simulation environments.
Software Dependencies No The paper mentions using 'Mu Jo Co environments' and implicitly deep learning frameworks (given 'actor lr', 'critic lr' parameters), but it does not specify any software components with their corresponding version numbers, which is required for a reproducible description of ancillary software.
Experiment Setup Yes Tables 1, 2, and 3 explicitly provide detailed hyperparameters for DHRL, BEAG, PIG, HIGL, and HIRO, including values for 'initial episodes without graph planning', 'hidden layer', 'actor lr', 'critic lr', 'batch size', 'target update freq', 'γ', and others. This constitutes specific experimental setup details.