A Distributional Analogue to the Successor Representation

Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997). As a baseline, we compare our method to an ensemble of γ-models (Janner et al., 2020), which is almost equivalent to a δ-model, with the difference being that the individual γ-models of the ensemble are trained independently rather than coupled through the model MMD loss.
Researcher Affiliation Collaboration 1Mc Gill University 2Mila Québec AI Institute 3Google Deep Mind 4Gatsby Unit, University College London 5CIFAR AI Chair.
Pseudocode Yes A. Algorithm. In this section, we restate the core δ-model update derived in Section 4, including the n-step bootstrapping and adversarial kernel modifications described in Section 5. Source code is provided at https://github.com/jessefarebro/ distributional-sr. Algorithm 1 δ-model update.
Open Source Code Yes Source code is provided at https://github.com/jessefarebro/ distributional-sr.
Open Datasets Yes We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997).
Dataset Splits No The paper mentions evaluating on 'held-out reward functions' and uses 'Pendulum' and 'Windy Gridworld' environments, but it does not provide explicit details about train/validation/test dataset splits (e.g., percentages, sample counts, or specific splitting methodologies).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software like Jax (Bradbury et al., 2018), Flax (Heek et al., 2023), Optax (Babuschkin et al., 2020), Ein Ops (Rogozhnikov, 2022), and Seaborn (Waskom, 2021). However, it does not provide specific version numbers for these software dependencies, only references to their general publications or projects.
Experiment Setup Yes D.2. Hyperparameters. Unless otherwise specified the default hyperparameters used for our implementation of δ-model are outlined in Table 1.