Efficient Multi-task Reinforcement Learning with Cross-Task Policy Guidance
Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations demonstrate that incorporating CTPG with these approaches significantly enhances performance in manipulation and locomotion benchmarks. |
| Researcher Affiliation | Collaboration | 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Ai Ri A 4Tsinghua University 5Tencent AI Lab |
| Pseudocode | Yes | A Pseudo Code. Algorithm 1 Control Policy s Training Step. Algorithm 2 Guide Policy s Training Step. Algorithm 3 Cross-Task Policy Guidance. |
| Open Source Code | Yes | The full code is provided in supplemental material |
| Open Datasets | Yes | We conduct experiments on Meta World manipulation and Half Cheetah locomotion MTRL benchmarks... Meta World benchmark [31]... Half Cheetah Task Group [11] |
| Dataset Splits | No | The paper describes training with a certain number of samples and evaluating the final policy, but it does not explicitly specify train/validation/test dataset splits with percentages or sample counts for the data used during its own training process. |
| Hardware Specification | Yes | We use AMD EPYC 7742 64-Core Processor with NVIDIA Geforce RTX 3090 GPU for training. |
| Software Dependencies | No | The paper states: 'We implement all experiments using the MTRL codebase [21] 4', but does not specify exact version numbers for programming languages (e.g., Python) or specific libraries (e.g., PyTorch, TensorFlow) beyond citing the codebase. |
| Experiment Setup | Yes | F.3 Hyper-Parameters of All Method. Table 3: General hyper-parameters of all methods... Table 10: Additional hyper-parameters of guide policy in CTPG. |