A neurally plausible model learns successor representations in partially observable environments

Authors: Eszter Vértes, Maneesh Sahani

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible. Figure 1 shows a state-space model corresponding to a random walk policy in the latent space with noisy observations, learned using DDCs (Algorithm 1). Figure 2 shows the value functions computed using the successor features learned in three different settings: assuming direct access to latent states, treating observations as though they were noise-free state measurements, and using latent state estimates inferred from observations. To demonstrate that this is not simply due to using the suboptimal random walk policy, but persists through learning, we have learned successor features while adjusting the policy to a given reward function (see figure 3).
Researcher Affiliation Academia Eszter Vértes Maneesh Sahani Gatsby Computational Neuroscience Unit University College London London W1T 4JG, UK. {eszter,maneesh}@gatsby.ucl.ac.uk
Pseudocode Yes Algorithm 1 Wake-sleep algorithm in the DDC state-space model
Open Source Code No The paper does not provide any explicit statement about open-sourcing code or links to a code repository.
Open Datasets No The paper describes a 'noisy 2D environment' and simulating data during 'sleep phase', but it does not use a publicly available dataset and provides no access information for any data used.
Dataset Splits No The paper does not specify any dataset splits for training, validation, or testing.
Hardware Specification No The paper does not mention any specific hardware used for running experiments (e.g., GPU/CPU models, memory, or cloud resources).
Software Dependencies No The paper does not provide specific software versions or library dependencies (e.g., 'Python 3.x', 'PyTorch 1.x').
Experiment Setup No The paper describes the general approach to learning (e.g., 'learned by generalized policy iteration', 'alternating between taking actions following a greedy policy'), but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or system-level training settings.