A Distributional Analogue to the Successor Representation
Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997). As a baseline, we compare our method to an ensemble of γ-models (Janner et al., 2020), which is almost equivalent to a δ-model, with the difference being that the individual γ-models of the ensemble are trained independently rather than coupled through the model MMD loss. |
| Researcher Affiliation | Collaboration | 1Mc Gill University 2Mila Québec AI Institute 3Google Deep Mind 4Gatsby Unit, University College London 5CIFAR AI Chair. |
| Pseudocode | Yes | A. Algorithm. In this section, we restate the core δ-model update derived in Section 4, including the n-step bootstrapping and adversarial kernel modifications described in Section 5. Source code is provided at https://github.com/jessefarebro/ distributional-sr. Algorithm 1 δ-model update. |
| Open Source Code | Yes | Source code is provided at https://github.com/jessefarebro/ distributional-sr. |
| Open Datasets | Yes | We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997). |
| Dataset Splits | No | The paper mentions evaluating on 'held-out reward functions' and uses 'Pendulum' and 'Windy Gridworld' environments, but it does not provide explicit details about train/validation/test dataset splits (e.g., percentages, sample counts, or specific splitting methodologies). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software like Jax (Bradbury et al., 2018), Flax (Heek et al., 2023), Optax (Babuschkin et al., 2020), Ein Ops (Rogozhnikov, 2022), and Seaborn (Waskom, 2021). However, it does not provide specific version numbers for these software dependencies, only references to their general publications or projects. |
| Experiment Setup | Yes | D.2. Hyperparameters. Unless otherwise specified the default hyperparameters used for our implementation of δ-model are outlined in Table 1. |