$\pi$2vec: Policy Representation with Successor Features
Authors: Gianluca Scarpellini, Ksenia Konyushkova, Claudio Fantacci, Thomas Paine, Yutian Chen, Misha Denil
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that π2vec achieves solid results in different tasks and across different settings. To summarize, our main contributions are the following: We evaluate our proposal through extensive experiments predicting return values of held out policies in 3 simulated and 2 real environments. Our approach outperforms the baseline and achieves solid results even in challenging real robotic settings and out-of-distribution scenarios; We investigate various feature encoders, ranging from semantic to geometrical visual foundation models, to show strengths and weaknesses of various representations for the task at hand. |
| Researcher Affiliation | Collaboration | Gianluca Scarpellini* Istituto Italiano di Tecnologia Ksenia Konyushkova Google Deep Mind Claudio Fantacci Google Deep Mind Tom Le Paine Google Deep Mind Yutian Chen Google Deep Mind Misha Denil Google Deep Mind *: Corresponding author gianluca.scarpellini@iit.it : Work done during an internship at Google Deep Mind |
| Pseudocode | No | The paper describes the method in prose and does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | The Metaworld Yu et al. (2020) and Kitchen Gupta et al. (2019) domains are widely known in the literature. |
| Dataset Splits | Yes | For evaluation, we adopt 3-fold cross-validation in all experiments. |
| Hardware Specification | No | The paper mentions running experiments on 'real robots' and 'simulated environments' but does not provide specific hardware details such as GPU/CPU models or other computing specifications used for training or inference. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific software dependencies with version numbers for replication. |
| Experiment Setup | Yes | We train the network for 50, 000 steps for Metaworld and Kitchen and 100, 000 steps for RGB Stacking, Insert Gear (Sim), and Insert Gear (Real). We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3e 5 and a discount factor of γ = 0.99. |