Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum

Authors: Jigang Kim, Daesol Cho, H. Jin Kim

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations in established ARL benchmarks and in RL environments modified for the ARL setting show that our method outperforms existing methods. Further analyses and ablation studies reveal that the proposed implicit curriculum (auxiliary agent) and explicit curriculum (bidirectional goal curriculum) are well-formed and necessary to successfully learn in the demonstration-free, non-episodic setting. 5. Experiment We include six sparse reward environments to evaluate our method.
Researcher Affiliation Academia 1Seoul National University 2Artificial Intelligence Institute of Seoul National University (AIIS) 3Automation and Systems Research Institute (ASRI).
Pseudocode Yes Algorithm 1 IBC
Open Source Code Yes The code implementation of IBC and the instructions for reproducing the main result is available at https: //github.com/snu-larr/ibc_official.
Open Datasets Yes Two environments Tabletop Manipulation, Sawyer Door are from established ARL benchmark, EARL (Sharma et al., 2021b), and the remaining four environments Fetch environments (Plappert et al., 2018), Point-U-Maze are modified versions of existing Mu Jo Co-based Open AI Gym environments (Todorov et al., 2012; Brockman et al., 2016) for the ARL setting.
Dataset Splits No The paper refers to an 'evaluation setting' and 'multiple evaluation episodes' but does not specify explicit train/validation/test dataset splits with percentages, counts, or a detailed splitting methodology.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software components like SAC and MuJoCo/OpenAI Gym but does not provide specific version numbers for these or any other ancillary software dependencies required for replication.
Experiment Setup Yes Table 2. Hyperparameters for IBC critic hidden dimension 512 discount factor γ 0.99 critic hidden depth 3 curriculum buffer Bc capacity (# of trajectories) 1000 critic target τ 0.01 # of curriculum candidates, K (# of trajectories) 50 critic target update frequency 2 curriculum update frequency (once every N episode) 20 actor hidden dimension 512 learning rate 1e-4 actor hidden depth 3 RL optimizer ADAM actor update frequency 2 init temperature αinit of SAC 0.5 RL batch size 512 replay buffer B capacity (# of transitions) 1e6 c in curriculum update 3 Lipschitz constant L 5