Hard Tasks First: Multi-Task Reinforcement Learning Through Task Scheduling

Authors: Myungsik Cho, Jongeui Park, Suyoung Lee, Youngchul Sung

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The efficacy of SMT s scheduling method is validated by significantly improving performance on challenging Meta-World benchmarks. ... We validate our method using the Meta-World benchmark (Yu et al., 2019), which comprises 50 robotic arm manipulation tasks. Our results demonstrate significant improvements over the baseline algorithm. ... 5. Experiments ... We report the results on MT10 and MT50 in Table 1 and 2, respectively. ... 5.3. Ablation Studies
Researcher Affiliation Academia 1School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea. Correspondence to: Youngchul Sung <ycsung@kaist.ac.kr>.
Pseudocode Yes Algorithm 1 Scheduled Multi-Task Training (SMT)
Open Source Code No The paper does not contain any explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We validate our method using the Meta-World benchmark (Yu et al., 2019), which comprises 50 robotic arm manipulation tasks. Our experiments use two modes, MT10 and MT50, with 10 and 50 manipulation tasks, respectively, from the benchmark, as shown in Figure 7.
Dataset Splits No The paper describes training processes and evaluation of task performance (e.g., 'If the mean return of the n most recent training trajectories for a task Ti exceeds a certain predefined threshold M, we consider it to be solved'), but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing in the traditional sense of partitioning a fixed dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used to run the experiments.
Software Dependencies No The paper mentions software components like 'SAC' and 'Adam' optimizer and 'ReLU' activation, but it does not provide specific version numbers for these software libraries, programming languages, or other dependencies (e.g., 'Python 3.x', 'PyTorch 1.x').
Experiment Setup Yes Appendix C, Table 6. Hyperparameters of SMT. Hyperparameters of SMT used for Meta-World MT10 and MT50 along with the notations in the manuscript. It lists: Training steps, Discount factor, Minibatch size per Task, Optimizer (all) Adam, Optimizer (all): learning rate 0.0003, Networks (MLP): activation Re LU, Networks (MLP): n. hidden layers, Networks (MLP): hidden units, Replay Buffer Size per Task, Target network update period, τ, Stage 1 Budgets B1, Stage 2 Budgets B2, Reset Interval (time steps) Treset, Scheduling Interval (time steps).