Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Authors: Tianying Ji, Yu Luo, Fuchun Sun, Xianyuan Zhan, Jianwei Zhang, Huazhe Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluation aims to investigate the following questions: 1) How effective is the proposed BEE operator in model-based and model-free paradigms? 2) How effectively does BAC perform in failure-prone scenarios, that highlight the ability to seize serendipity and fleeting successes, particularly in various real-world tasks? 5. Experiments
Researcher Affiliation Collaboration 1Department of Computer Science and Technology, Tsinghua University 2Institute for AI Industry Research, Tsinghua University 3Shanghai Artificial Intelligence Laboratory 4Department of Informatics, University of Hamburg 5Institute for Interdisciplinary Information Sciences, Tsinghua University 6Shanghai Qi Zhi Institute.
Pseudocode Yes Algorithm 1 BEE Actor-Critic (BAC)
Open Source Code No The paper states: 'Benchmark results and videos are available at https://jity16.github.io/BEE/.' which is a project website, not an explicit statement about the open-sourcing of their code. There are no other explicit mentions of their code being open-sourced or a link to a repository for their specific methodology.
Open Datasets Yes Our experimental evaluation aims to investigate the following questions: 1) How effective is the proposed BEE operator in model-based and model-free paradigms? 2) How effectively does BAC perform in failure-prone scenarios, that highlight the ability to seize serendipity and fleeting successes, particularly in various real-world tasks? 5. Experiments... Mu Jo Co (Todorov et al., 2012), DMControl (Tunyasuvunakool et al., 2020), Meta-World (Yu et al., 2019), Mani Skill2 (Gu et al., 2023), Adroit (Rajeswaran et al., 2017), Myo Suite (Vittorio et al., 2022), ROBEL (Ahn et al., 2020) benchmark tasks...
Dataset Splits No The paper refers to using standard benchmark environments like Mu Jo Co and DMControl but does not explicitly detail the training/validation/test dataset splits (e.g., percentages or sample counts) within the text.
Hardware Specification Yes Table 3. Computing infrastructure and the computational time for Mu Jo Co benchmark tasks. CPU AMD EPYC 7763 64-Core Processor (256 threads) GPU NVIDIA Ge Force RTX 3090 4
Software Dependencies No The paper mentions software like 'Mu Jo Co' and 'DMControl', and that the SAC implementation is based on 'pytorch-soft-actor-critic', but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes Table 1. Hyperparameter settings for BAC in benchmark tasks. Q-value network MLP with hidden size 512, discounted factor 0.99, soft update factor 0.005, learning rate 0.0003, batch size 512, policy updates per step 1, value updates per step 1.