Composing Task Knowledge With Modular Successor Feature Approximators
Authors: Wilka Torrico Carvalho, Angelos Filos, Richard Lewis, Honglak Lee, Satinder Singh
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that MSFA is able to better generalize compared to baseline architectures for learning SFs and modular architectures for learning state representations. We study an agent s ability to combine task knowledge in three environments. |
| Researcher Affiliation | Collaboration | Wilka Carvalho ,1 Angelos Filos2 Richard L. Lewis1 Honglak lee1,3 Satinder Singh1 1University of Michigan 2University of Oxford 3LG AI Research |
| Pseudocode | No | The paper describes the architecture and learning algorithm using text and mathematical equations, but it does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoffman et al., 2020) for reinforcement learning. This refers to third-party tools, not their specific code. No explicit statement about releasing their own code is found. |
| Open Datasets | Yes | We implement a simplified version of the object navigation task of (Borsa et al., 2019) in the (Chevalier-Boisvert et al., 2019) Baby AI environment. We leverage the Fruitbot environment within Proc Gen (Cobbe et al., 2020). |
| Dataset Splits | No | During training, the experiences ntrain tasks Mtrain = {Mi}ntrain i=1 , sampled from a training distribution ptrain(M). During testing, the agent is evaluated on ntest tasks, {Mi}ntest i=1 , sampled from a testing distribution ptest(M). No specific numerical or methodological details on train/validation/test splits are provided. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | All agents are built with JAX (Bradbury et al., 2018) using the open-source ACME codebase (Hoffman et al., 2020) for reinforcement learning. While JAX and ACME are mentioned, no specific version numbers are provided for these or other software dependencies. |
| Experiment Setup | Yes | We train UVFA and UVFA+FARM with n-step Q-learning (Watkins & Dayan, 1992). When learning cumulants, USFA and MSFA have the exact same losses and learning alogirthm. They both learn Q-values and SFs with n-step Q-learning. We use n = 5. |