Dynamic Bottleneck for Robust Self-Supervised Exploration

Authors: Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments. We evaluate SSE-DB on Atari games. We conduct experiments to compare the following methods.
Researcher Affiliation Collaboration Chenjia Bai Harbin Institute of Technology bai_chenjia@stu.hit.edu.cn Lingxiao Wang Northwestern University lingxiaowang2022@u.northwestern.edu Lei Han Tencent Robotics X lxhan@tencent.com Animesh Garg University of Toronto, Vector Institute, NVIDIA garg@cs.toronto.edu Jianye Hao Tianjin University jianye.hao@tju.edu.cn Peng Liu Harbin Institute of Technology pengliu@hit.edu.cn Zhaoran Wang Northwestern University zhaoranwang@gmail.com
Pseudocode Yes We refer to Appendix B for the pseudocode of training DB model. Algorithm 1 SSE-DB
Open Source Code Yes The codes are available at https://github.com/Baichenjia/DB.
Open Datasets Yes We evaluate all methods on Atari games with high-dimensional observations. The selected 18 games are frequently used in previous approaches for efficient exploration.
Dataset Splits No The paper uses Atari games for evaluation but does not explicitly specify exact training/validation/test splits, only that results are from training without extrinsic rewards and evaluated on those.
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments. It mentions 'computation resources' in the acknowledgements, but no specific models or specifications.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper discusses the overall approach and model architecture but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It refers to Appendix D for implementation details, but these are not provided in the main paper for analysis.