A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
Authors: Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present our experimental settings and ablation studies of our CP agent against baselines to investigate the OOD generalization capabilities enabled by the C1-inspired bottleneck mechanism. |
| Researcher Affiliation | Collaboration | 1Mc Gill University; 2Université de Montréal; 3Deep Mind; 4Mila; 5 CIFAR AI Chair |
| Pseudocode | Yes | We present the pseudocode of the Q-value based prioritized tree-search MPC in Appendix. |
| Open Source Code | Yes | Check project page https://github.com/Pwner Harry/CP. |
| Open Datasets | Yes | We use environments based on the Mini Grid-Baby AI framework [11, 10, 21], which can be customized for generating OOD generalization tests with varying difficulties. |
| Dataset Splits | No | The paper states that for each episode (training or OOD), a new environment is randomly generated, and distinguishes between 'training environments' and 'OOD evaluation tasks'. It does not explicitly mention or detail a separate validation split or fixed data splits in the traditional sense. |
| Hardware Specification | No | The paper states, 'We acknowledge the computational power provided by Compute Canada,' but does not provide specific hardware details such as GPU or CPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper refers to frameworks and algorithms such as 'Mini Grid-Baby AI framework', 'Deep Sets', 'Double-DQN (DDQN)', and 'Adam', but does not specify their version numbers or any other software dependencies with specific versions. |
| Experiment Setup | No | The paper mentions that more details on agent settings and hyperparameters are available in the Appendix ('For more details, please check the Appendix' in Section 5.2), but does not provide specific experimental setup details such as hyperparameter values in the main text. |