Dynamic Bottleneck for Robust Self-Supervised Exploration
Authors: Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments. We evaluate SSE-DB on Atari games. We conduct experiments to compare the following methods. |
| Researcher Affiliation | Collaboration | Chenjia Bai Harbin Institute of Technology bai_chenjia@stu.hit.edu.cn Lingxiao Wang Northwestern University lingxiaowang2022@u.northwestern.edu Lei Han Tencent Robotics X lxhan@tencent.com Animesh Garg University of Toronto, Vector Institute, NVIDIA garg@cs.toronto.edu Jianye Hao Tianjin University jianye.hao@tju.edu.cn Peng Liu Harbin Institute of Technology pengliu@hit.edu.cn Zhaoran Wang Northwestern University zhaoranwang@gmail.com |
| Pseudocode | Yes | We refer to Appendix B for the pseudocode of training DB model. Algorithm 1 SSE-DB |
| Open Source Code | Yes | The codes are available at https://github.com/Baichenjia/DB. |
| Open Datasets | Yes | We evaluate all methods on Atari games with high-dimensional observations. The selected 18 games are frequently used in previous approaches for efficient exploration. |
| Dataset Splits | No | The paper uses Atari games for evaluation but does not explicitly specify exact training/validation/test splits, only that results are from training without extrinsic rewards and evaluated on those. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments. It mentions 'computation resources' in the acknowledgements, but no specific models or specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper discusses the overall approach and model architecture but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It refers to Appendix D for implementation details, but these are not provided in the main paper for analysis. |