Composing Task Knowledge With Modular Successor Feature Approximators

Authors: Wilka Torrico Carvalho, Angelos Filos, Richard Lewis, Honglak Lee, Satinder Singh

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that MSFA is able to better generalize compared to baseline architectures for learning SFs and modular architectures for learning state representations. We study an agent s ability to combine task knowledge in three environments.
Researcher Affiliation Collaboration Wilka Carvalho ,1 Angelos Filos2 Richard L. Lewis1 Honglak lee1,3 Satinder Singh1 1University of Michigan 2University of Oxford 3LG AI Research
Pseudocode No The paper describes the architecture and learning algorithm using text and mathematical equations, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoffman et al., 2020) for reinforcement learning. This refers to third-party tools, not their specific code. No explicit statement about releasing their own code is found.
Open Datasets Yes We implement a simplified version of the object navigation task of (Borsa et al., 2019) in the (Chevalier-Boisvert et al., 2019) Baby AI environment. We leverage the Fruitbot environment within Proc Gen (Cobbe et al., 2020).
Dataset Splits No During training, the experiences ntrain tasks Mtrain = {Mi}ntrain i=1 , sampled from a training distribution ptrain(M). During testing, the agent is evaluated on ntest tasks, {Mi}ntest i=1 , sampled from a testing distribution ptest(M). No specific numerical or methodological details on train/validation/test splits are provided.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoffman et al., 2020) for reinforcement learning. While JAX and ACME are mentioned, no specific version numbers are provided for these or other software dependencies.
Experiment Setup Yes We train UVFA and UVFA+FARM with n-step Q-learning (Watkins & Dayan, 1992). When learning cumulants, USFA and MSFA have the exact same losses and learning alogirthm. They both learn Q-values and SFs with n-step Q-learning. We use n = 5.