reproducibilityindex.ai

A Distributional Analogue to the Successor Representation

Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997). As a baseline, we compare our method to an ensemble of γ-models (Janner et al., 2020), which is almost equivalent to a δ-model, with the difference being that the individual γ-models of the ensemble are trained independently rather than coupled through the model MMD loss.
Researcher Affiliation	Collaboration	1Mc Gill University 2Mila Québec AI Institute 3Google Deep Mind 4Gatsby Unit, University College London 5CIFAR AI Chair.
Pseudocode	Yes	A. Algorithm. In this section, we restate the core δ-model update derived in Section 4, including the n-step bootstrapping and adversarial kernel modifications described in Section 5. Source code is provided at https://github.com/jessefarebro/ distributional-sr. Algorithm 1 δ-model update.
Open Source Code	Yes	Source code is provided at https://github.com/jessefarebro/ distributional-sr.
Open Datasets	Yes	We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997).
Dataset Splits	No	The paper mentions evaluating on 'held-out reward functions' and uses 'Pendulum' and 'Windy Gridworld' environments, but it does not provide explicit details about train/validation/test dataset splits (e.g., percentages, sample counts, or specific splitting methodologies).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software like Jax (Bradbury et al., 2018), Flax (Heek et al., 2023), Optax (Babuschkin et al., 2020), Ein Ops (Rogozhnikov, 2022), and Seaborn (Waskom, 2021). However, it does not provide specific version numbers for these software dependencies, only references to their general publications or projects.
Experiment Setup	Yes	D.2. Hyperparameters. Unless otherwise specified the default hyperparameters used for our implementation of δ-model are outlined in Table 1.