reproducibilityindex.ai

Composing Task Knowledge With Modular Successor Feature Approximators

Authors: Wilka Torrico Carvalho, Angelos Filos, Richard Lewis, Honglak Lee, Satinder Singh

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that MSFA is able to better generalize compared to baseline architectures for learning SFs and modular architectures for learning state representations. We study an agent s ability to combine task knowledge in three environments.
Researcher Affiliation	Collaboration	Wilka Carvalho ,1 Angelos Filos2 Richard L. Lewis1 Honglak lee1,3 Satinder Singh1 1University of Michigan 2University of Oxford 3LG AI Research
Pseudocode	No	The paper describes the architecture and learning algorithm using text and mathematical equations, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoﬀman et al., 2020) for reinforcement learning. This refers to third-party tools, not their specific code. No explicit statement about releasing their own code is found.
Open Datasets	Yes	We implement a simpliﬁed version of the object navigation task of (Borsa et al., 2019) in the (Chevalier-Boisvert et al., 2019) Baby AI environment. We leverage the Fruitbot environment within Proc Gen (Cobbe et al., 2020).
Dataset Splits	No	During training, the experiences ntrain tasks Mtrain = {Mi}ntrain i=1 , sampled from a training distribution ptrain(M). During testing, the agent is evaluated on ntest tasks, {Mi}ntest i=1 , sampled from a testing distribution ptest(M). No specific numerical or methodological details on train/validation/test splits are provided.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoﬀman et al., 2020) for reinforcement learning. While JAX and ACME are mentioned, no specific version numbers are provided for these or other software dependencies.
Experiment Setup	Yes	We train UVFA and UVFA+FARM with n-step Q-learning (Watkins & Dayan, 1992). When learning cumulants, USFA and MSFA have the exact same losses and learning alogirthm. They both learn Q-values and SFs with n-step Q-learning. We use n = 5.