Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Composing Task Knowledge With Modular Successor Feature Approximators
Authors: Wilka Torrico Carvalho, Angelos Filos, Richard Lewis, Honglak Lee, Satinder Singh
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that MSFA is able to better generalize compared to baseline architectures for learning SFs and modular architectures for learning state representations. We study an agent s ability to combine task knowledge in three environments. |
| Researcher Affiliation | Collaboration | Wilka Carvalho ,1 Angelos Filos2 Richard L. Lewis1 Honglak lee1,3 Satinder Singh1 1University of Michigan 2University of Oxford 3LG AI Research |
| Pseudocode | No | The paper describes the architecture and learning algorithm using text and mathematical equations, but it does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoffman et al., 2020) for reinforcement learning. This refers to third-party tools, not their specific code. No explicit statement about releasing their own code is found. |
| Open Datasets | Yes | We implement a simplified version of the object navigation task of (Borsa et al., 2019) in the (Chevalier-Boisvert et al., 2019) Baby AI environment. We leverage the Fruitbot environment within Proc Gen (Cobbe et al., 2020). |
| Dataset Splits | No | During training, the experiences ntrain tasks Mtrain = {Mi}ntrain i=1 , sampled from a training distribution ptrain(M). During testing, the agent is evaluated on ntest tasks, {Mi}ntest i=1 , sampled from a testing distribution ptest(M). No specific numerical or methodological details on train/validation/test splits are provided. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoffman et al., 2020) for reinforcement learning. While JAX and ACME are mentioned, no specific version numbers are provided for these or other software dependencies. |
| Experiment Setup | Yes | We train UVFA and UVFA+FARM with n-step Q-learning (Watkins & Dayan, 1992). When learning cumulants, USFA and MSFA have the exact same losses and learning alogirthm. They both learn Q-values and SFs with n-step Q-learning. We use n = 5. |