reproducibilityindex.ai

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Authors: Harry Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Generalization-focused experiments validate Skipper s significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods. ... We show through detailed controlled experiments that the proposed framework, named Skipper, in most cases performs significantly better in terms of zero-shot generalization, compared to the baselines and to some state-of-the-art Hierarchical Planning (HP) methods...
Researcher Affiliation	Collaboration	1Mc Gill University, 2Universit e de Montr eal, 3Mila, 4Sony AI, 5Google Deep Mind
Pseudocode	Yes	Algorithm 1: Skipper with Random Checkpoints (implementation choice in purple)... Algorithm 2: Checkpoint Pruning with k-medoids... Algorithm 3: Delusion Suppression
Open Source Code	Yes	Source code of experiments available at https://github.com/mila-iqia/Skipper. ... The results presented in the experiments are fully-reproducible with the open-sourced repository https://github.com/mila-iqia/Skipper.
Open Datasets	Yes	Thus, we base our experimental setting on the Mini Grid-Baby AI framework (Chevalier-Boisvert et al., 2018b;a; Hui et al., 2020). ... Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. Git Hub repository, 2018b. https://github.com/maximecb/ gym-minigrid.
Dataset Splits	No	The paper states, 'Across all experiments, we sample training tasks from an environment distribution of difficulty 0.4... The evaluation tasks are sampled from a gradient of OOD difficulties 0.25, 0.35, 0.45 and 0.55', detailing training and evaluation sets. However, it does not explicitly mention a distinct validation data split for hyperparameter tuning or model selection.
Hardware Specification	No	The paper mentions 'computational resources allocated to us from Mila and Mc Gill university' but does not specify any particular hardware components like CPU or GPU models, or memory details.
Software Dependencies	No	The paper refers to various algorithms and models such as 'Adam', 'C51 distributional TD learning', and 'VQ-VAE', citing their original papers. It also mentions basing experiments on publicly available code (e.g., for Director), but it does not specify concrete version numbers for software libraries like PyTorch or TensorFlow.
Experiment Setup	Yes	All the trainable parameters are optimized with Adam at a rate of 2.5 10 4 (Kingma & Ba, 2014), with a gradient clipping by value (maximum absolute value 1.0). ... The internal γ for intrinsic reward of π is 0.95, while the task γ is 0.99. ... The estimators, which operate on the partial states, are 3-layered MLPs with 256 hidden units. ... The whole checkpoint generator is trained end-to-end with a standard VAE loss. That is the sum of a KL-divergence for the agent s location, and the entropy of partial descriptions, weighted by 2.5 10 4...