A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

Authors: Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present our experimental settings and ablation studies of our CP agent against baselines to investigate the OOD generalization capabilities enabled by the C1-inspired bottleneck mechanism.
Researcher Affiliation Collaboration 1Mc Gill University; 2Université de Montréal; 3Deep Mind; 4Mila; 5 CIFAR AI Chair
Pseudocode Yes We present the pseudocode of the Q-value based prioritized tree-search MPC in Appendix.
Open Source Code Yes Check project page https://github.com/Pwner Harry/CP.
Open Datasets Yes We use environments based on the Mini Grid-Baby AI framework [11, 10, 21], which can be customized for generating OOD generalization tests with varying difficulties.
Dataset Splits No The paper states that for each episode (training or OOD), a new environment is randomly generated, and distinguishes between 'training environments' and 'OOD evaluation tasks'. It does not explicitly mention or detail a separate validation split or fixed data splits in the traditional sense.
Hardware Specification No The paper states, 'We acknowledge the computational power provided by Compute Canada,' but does not provide specific hardware details such as GPU or CPU models, or memory specifications used for the experiments.
Software Dependencies No The paper refers to frameworks and algorithms such as 'Mini Grid-Baby AI framework', 'Deep Sets', 'Double-DQN (DDQN)', and 'Adam', but does not specify their version numbers or any other software dependencies with specific versions.
Experiment Setup No The paper mentions that more details on agent settings and hyperparameters are available in the Appendix ('For more details, please check the Appendix' in Section 5.2), but does not provide specific experimental setup details such as hyperparameter values in the main text.