Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning
Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on diverse robotic manipulation tasks within the Meta World benchmark demonstrate the effectiveness and versatility of GO-Skill. |
| Researcher Affiliation | Collaboration | 1C2DL, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Tencent AI Lab 4Tsinghua University 5Ai Ri A. Correspondence to: Kai Li <EMAIL>, Junliang Xing <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 GO-Skill Extraction and Enhancement Algorithm 2 GO-Skill Policy Learning |
| Open Source Code | No | The paper states: "We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework.", with footnote 2 pointing to https://github.com/mxu34/prompt-dt. This refers to a third-party codebase used as a baseline or framework, not the authors' own implementation of GO-Skill. There is no explicit statement or link indicating that the source code for GO-Skill is provided. |
| Open Datasets | Yes | Our experiments are evaluated on the Meta World benchmark (Yu et al., 2020b), an MTRL benchmark consisting of 50 robotic manipulation tasks... the dataset are sourced from SAC-Replay (Haarnoja et al., 2018) ranging from random to expert experiences |
| Dataset Splits | Yes | We define two distinct dataset settings: (1) Near-Optimal, which includes the complete experience (100M transitions) from random to expert-level performance in SAC-Replay, and (2) Sub-Optimal, which contains the first 50% (50M transitions) of the near-optimal dataset for each task... We experiment with three different setups: (1) MT50, which includes the full set of 50 tasks, (2) MT30, a subset containing 30 operational tasks from MT50, and (3) ML45, where 45 tasks are used for pre-training and the remaining 5 tasks are used for fine-tuning evaluation. |
| Hardware Specification | Yes | We use NVIDIA Geforce RTX 3090 GPU for training and AMD EPYC 7742 64-Core Processor for evaluation with the environments. |
| Software Dependencies | Yes | We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework... For the offline dataset, we follow the approach outlined by He et al. (2023), using the Soft Actor-Critic (SAC) algorithm (Haarnoja et al., 2018)... our experiments are conducted on the stable version, Meta World-V2 1. |
| Experiment Setup | Yes | We present the common hyper-parameters in Table 4 and the additional hyper-parameters for GO-Skill in Table 5. Notably, the total number of iterations for the baselines is 1e5. GO-Skill first performs skill extraction using 3e4 iterations, followed by parallel iterations of skill enhancement and policy learning for 7e4. |