Efficient Multi-task Reinforcement Learning with Cross-Task Policy Guidance

Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations demonstrate that incorporating CTPG with these approaches significantly enhances performance in manipulation and locomotion benchmarks.
Researcher Affiliation Collaboration 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Ai Ri A 4Tsinghua University 5Tencent AI Lab
Pseudocode Yes A Pseudo Code. Algorithm 1 Control Policy s Training Step. Algorithm 2 Guide Policy s Training Step. Algorithm 3 Cross-Task Policy Guidance.
Open Source Code Yes The full code is provided in supplemental material
Open Datasets Yes We conduct experiments on Meta World manipulation and Half Cheetah locomotion MTRL benchmarks... Meta World benchmark [31]... Half Cheetah Task Group [11]
Dataset Splits No The paper describes training with a certain number of samples and evaluating the final policy, but it does not explicitly specify train/validation/test dataset splits with percentages or sample counts for the data used during its own training process.
Hardware Specification Yes We use AMD EPYC 7742 64-Core Processor with NVIDIA Geforce RTX 3090 GPU for training.
Software Dependencies No The paper states: 'We implement all experiments using the MTRL codebase [21] 4', but does not specify exact version numbers for programming languages (e.g., Python) or specific libraries (e.g., PyTorch, TensorFlow) beyond citing the codebase.
Experiment Setup Yes F.3 Hyper-Parameters of All Method. Table 3: General hyper-parameters of all methods... Table 10: Additional hyper-parameters of guide policy in CTPG.