Reverse Forward Curriculum Learning for Extreme Sample and Demo Efficiency

Authors: Stone Tao, Arth Shukla, Tse-kai Chan, Hao Su

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously evaluate RFCL against several state-of-the-art baselines across 21 fully-observable manipulation tasks from 3 benchmarks: Adroit, Mani Skill2, and Meta World (Rajeswaran et al., 2018; Gu et al., 2023; Yu et al., 2019).
Researcher Affiliation Academia Stone Tao & Arth Shukla & Tse-kai Chan & Hao Su University of California, San Diego {stao, arshukla, tsc003, haosu}@ucsd.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Website with code and visualizations are here: https://reverseforward-cl.github.io/All code is open sourced on Github and are excited with how far the community can push when leveraging more properties of simulation.All experiments on the RFCL method can be reproduced with given runnable scripts + docker images uploaded on https://github.com/stonet2000/rfcl
Open Datasets Yes We rigorously evaluate RFCL against several state-of-the-art baselines across 21 fully-observable manipulation tasks from 3 benchmarks: Adroit, Mani Skill2, and Meta World (Rajeswaran et al., 2018; Gu et al., 2023; Yu et al., 2019).
Dataset Splits No The paper does not explicitly mention training/test/validation dataset splits or cross-validation setup in the way described for supervised learning datasets. It refers to training interaction steps and evaluations.
Hardware Specification Yes Experiments all ran on a RTX 2080 GPU.
Software Dependencies No The paper mentions software like Soft Actor Critic, but does not specify versions for programming languages (e.g., Python), libraries (e.g., PyTorch), or other software components used for the experiments.
Experiment Setup Yes Table 5: RFCL sample-efficient variation of hyperparameters. These are the ones used to generate all figures and results. Highlighted in blue indicates hyperparameters introduced by this paper, which are for the automatic construction of reverse and forward curriculums. The non-highlighted hyperparameters are standard ones used in Soft-Actor-Critic with a Q-ensemble or Prioritized Level Replay (PLR).