CUP: Critic-Guided Policy Reuse

Authors: Jin Zhang, Siyuan Li, Chongjie Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that CUP achieves efficient transfer and significantly outperforms baseline algorithms. We evaluate CUP on Meta-World (Yu et al., 2020), a popular reinforcement learning benchmark composed of multiple robot arm manipulation tasks. Empirical results demonstrate that CUP achieves efficient transfer and significantly outperforms baseline algorithms.
Researcher Affiliation Academia Jin Zhang1, Siyuan Li2, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2School of Computer Science and Technology, Harbin Institute of Technology, China
Pseudocode Yes Algorithm 1 CUP
Open Source Code Yes Codes are available at https://github.com/Nagisa Zj/CUP.
Open Datasets Yes We evaluate on Meta-World (Yu et al., 2020), a popular reinforcement learning benchmark composed of multiple robot arm manipulation tasks.
Dataset Splits No The paper mentions evaluating on Meta-World and averaging results over six random seeds, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using SAC as the underlying algorithm but does not specify software versions for libraries, frameworks, or programming languages used.
Experiment Setup Yes All the results are averaged over six random seeds. CUP introduces only two additional hyper-parameters to the underlying SAC algorithm, and we further test CUP s sensitivity to these additional hyper-parameters. Additional implementation details are deferred to Appendix D.1.