reproducibilityindex.ai

Discovering Object-Centric Generalized Value Functions From Pixels

Authors: Somjit Nath, Gopeshh Subbaraj, Khimya Khetarpal, Samira Ebrahimi Kahou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.
Researcher Affiliation	Collaboration	1 Ecole de technologie sup erieure 2Mila-Quebec AI Institute 3Universit e de Montr eal 4Mc Gill University 5CIFAR AI Chair now at Deep Mind. Correspondence to: Somjit Nath <somjit.nath.1@ens.etsmtl.ca>.
Pseudocode	Yes	Algorithm 1 Object-Centric GVFs (OC-GVFs)
Open Source Code	Yes	We include all the relevant hyperparameters and implementation details in Appendix A.4 and open-source the code3. 3https://github.com/Somjit77/oc_gvfs
Open Datasets	Yes	Collect-objects Environment: is a customized version of the four-room gridworld environment similar to the one used in Veeriah et al.. Mini Grid-Dynamic Obstacles: For the experiments on non-stationarity, we used the Mini Grid Dynamic Obstacles (Chevalier-Boisvert et al., 2018). Coin Run & Star Pilot: are a part of procedurally generated environments called Proc Gen (Cobbe et al., 2019).
Dataset Splits	No	Table 1. Hyper-Parameters of all experiments Environment Algorithm Parameters Encoders and Decoders Slot Attention Parameters Collect Objects train episodes : 5000, batch size : 32, target period : 100, replay capacity : 100000, hidden arch : [64,32], epsilon begin : 1.0, epsilon end : 0.01, epsilon steps : 0.8, discount factor : 0.99, learning rate : 0.0001, eval episodes : 100, evaluate every : 50, num gvfs : 5, unroll steps : 10
Hardware Specification	Yes	All our experiments were run on a single V100 GPU.
Software Dependencies	No	The paper does not explicitly list software dependencies with specific version numbers.
Experiment Setup	Yes	Table 1. Hyper-Parameters of all experiments Environment Algorithm Parameters Encoders and Decoders Slot Attention Parameters Collect Objects train episodes : 5000, batch size : 32, target period : 100, replay capacity : 100000, hidden arch : [64,32], epsilon begin : 1.0, epsilon end : 0.01, epsilon steps : 0.8, discount factor : 0.99, learning rate : 0.0001, eval episodes : 100, evaluate every : 50, num gvfs : 5, unroll steps : 10