reproducibilityindex.ai

$\pi$2vec: Policy Representation with Successor Features

Authors: Gianluca Scarpellini, Ksenia Konyushkova, Claudio Fantacci, Thomas Paine, Yutian Chen, Misha Denil

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that π2vec achieves solid results in different tasks and across different settings. To summarize, our main contributions are the following: We evaluate our proposal through extensive experiments predicting return values of held out policies in 3 simulated and 2 real environments. Our approach outperforms the baseline and achieves solid results even in challenging real robotic settings and out-of-distribution scenarios; We investigate various feature encoders, ranging from semantic to geometrical visual foundation models, to show strengths and weaknesses of various representations for the task at hand.
Researcher Affiliation	Collaboration	Gianluca Scarpellini* Istituto Italiano di Tecnologia Ksenia Konyushkova Google Deep Mind Claudio Fantacci Google Deep Mind Tom Le Paine Google Deep Mind Yutian Chen Google Deep Mind Misha Denil Google Deep Mind *: Corresponding author gianluca.scarpellini@iit.it : Work done during an internship at Google Deep Mind
Pseudocode	No	The paper describes the method in prose and does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	The Metaworld Yu et al. (2020) and Kitchen Gupta et al. (2019) domains are widely known in the literature.
Dataset Splits	Yes	For evaluation, we adopt 3-fold cross-validation in all experiments.
Hardware Specification	No	The paper mentions running experiments on 'real robots' and 'simulated environments' but does not provide specific hardware details such as GPU/CPU models or other computing specifications used for training or inference.
Software Dependencies	No	The paper mentions using 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific software dependencies with version numbers for replication.
Experiment Setup	Yes	We train the network for 50, 000 steps for Metaworld and Kitchen and 100, 000 steps for RGB Stacking, Insert Gear (Sim), and Insert Gear (Real). We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3e 5 and a discount factor of γ = 0.99.