Discovering Object-Centric Generalized Value Functions From Pixels

Authors: Somjit Nath, Gopeshh Subbaraj, Khimya Khetarpal, Samira Ebrahimi Kahou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.
Researcher Affiliation Collaboration 1 Ecole de technologie sup erieure 2Mila-Quebec AI Institute 3Universit e de Montr eal 4Mc Gill University 5CIFAR AI Chair now at Deep Mind. Correspondence to: Somjit Nath <somjit.nath.1@ens.etsmtl.ca>.
Pseudocode Yes Algorithm 1 Object-Centric GVFs (OC-GVFs)
Open Source Code Yes We include all the relevant hyperparameters and implementation details in Appendix A.4 and open-source the code3. 3https://github.com/Somjit77/oc_gvfs
Open Datasets Yes Collect-objects Environment: is a customized version of the four-room gridworld environment similar to the one used in Veeriah et al.. Mini Grid-Dynamic Obstacles: For the experiments on non-stationarity, we used the Mini Grid Dynamic Obstacles (Chevalier-Boisvert et al., 2018). Coin Run & Star Pilot: are a part of procedurally generated environments called Proc Gen (Cobbe et al., 2019).
Dataset Splits No Table 1. Hyper-Parameters of all experiments Environment Algorithm Parameters Encoders and Decoders Slot Attention Parameters Collect Objects train episodes : 5000, batch size : 32, target period : 100, replay capacity : 100000, hidden arch : [64,32], epsilon begin : 1.0, epsilon end : 0.01, epsilon steps : 0.8, discount factor : 0.99, learning rate : 0.0001, eval episodes : 100, evaluate every : 50, num gvfs : 5, unroll steps : 10
Hardware Specification Yes All our experiments were run on a single V100 GPU.
Software Dependencies No The paper does not explicitly list software dependencies with specific version numbers.
Experiment Setup Yes Table 1. Hyper-Parameters of all experiments Environment Algorithm Parameters Encoders and Decoders Slot Attention Parameters Collect Objects train episodes : 5000, batch size : 32, target period : 100, replay capacity : 100000, hidden arch : [64,32], epsilon begin : 1.0, epsilon end : 0.01, epsilon steps : 0.8, discount factor : 0.99, learning rate : 0.0001, eval episodes : 100, evaluate every : 50, num gvfs : 5, unroll steps : 10