reproducibilityindex.ai

A neurally plausible model learns successor representations in partially observable environments

Authors: Eszter Vértes, Maneesh Sahani

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible. Figure 1 shows a state-space model corresponding to a random walk policy in the latent space with noisy observations, learned using DDCs (Algorithm 1). Figure 2 shows the value functions computed using the successor features learned in three different settings: assuming direct access to latent states, treating observations as though they were noise-free state measurements, and using latent state estimates inferred from observations. To demonstrate that this is not simply due to using the suboptimal random walk policy, but persists through learning, we have learned successor features while adjusting the policy to a given reward function (see ﬁgure 3).
Researcher Affiliation	Academia	Eszter Vértes Maneesh Sahani Gatsby Computational Neuroscience Unit University College London London W1T 4JG, UK. {eszter,maneesh}@gatsby.ucl.ac.uk
Pseudocode	Yes	Algorithm 1 Wake-sleep algorithm in the DDC state-space model
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing code or links to a code repository.
Open Datasets	No	The paper describes a 'noisy 2D environment' and simulating data during 'sleep phase', but it does not use a publicly available dataset and provides no access information for any data used.
Dataset Splits	No	The paper does not specify any dataset splits for training, validation, or testing.
Hardware Specification	No	The paper does not mention any specific hardware used for running experiments (e.g., GPU/CPU models, memory, or cloud resources).
Software Dependencies	No	The paper does not provide specific software versions or library dependencies (e.g., 'Python 3.x', 'PyTorch 1.x').
Experiment Setup	No	The paper describes the general approach to learning (e.g., 'learned by generalized policy iteration', 'alternating between taking actions following a greedy policy'), but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or system-level training settings.