Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Distributional Analogue to the Successor Representation

Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997). As a baseline, we compare our method to an ensemble of γ-models (Janner et al., 2020), which is almost equivalent to a δ-model, with the difference being that the individual γ-models of the ensemble are trained independently rather than coupled through the model MMD loss.
Researcher Affiliation Collaboration 1Mc Gill University 2Mila Québec AI Institute 3Google Deep Mind 4Gatsby Unit, University College London 5CIFAR AI Chair.
Pseudocode Yes A. Algorithm. In this section, we restate the core δ-model update derived in Section 4, including the n-step bootstrapping and adversarial kernel modifications described in Section 5. Source code is provided at https://github.com/jessefarebro/ distributional-sr. Algorithm 1 δ-model update.
Open Source Code Yes Source code is provided at https://github.com/jessefarebro/ distributional-sr.
Open Datasets Yes We evaluate our implementation of the distributional SM on two domains, namely a stochastic Windy Gridworld" environment, where a pointmass navigates 2D continuous grid subject to random wind force that pushes it towards the corners, and the Pendulum environment (Atkeson & Schaal, 1997).
Dataset Splits No The paper mentions evaluating on 'held-out reward functions' and uses 'Pendulum' and 'Windy Gridworld' environments, but it does not provide explicit details about train/validation/test dataset splits (e.g., percentages, sample counts, or specific splitting methodologies).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software like Jax (Bradbury et al., 2018), Flax (Heek et al., 2023), Optax (Babuschkin et al., 2020), Ein Ops (Rogozhnikov, 2022), and Seaborn (Waskom, 2021). However, it does not provide specific version numbers for these software dependencies, only references to their general publications or projects.
Experiment Setup Yes D.2. Hyperparameters. Unless otherwise specified the default hyperparameters used for our implementation of δ-model are outlined in Table 1.