Discovering Object-Centric Generalized Value Functions From Pixels
Authors: Somjit Nath, Gopeshh Subbaraj, Khimya Khetarpal, Samira Ebrahimi Kahou
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation. |
| Researcher Affiliation | Collaboration | 1 Ecole de technologie sup erieure 2Mila-Quebec AI Institute 3Universit e de Montr eal 4Mc Gill University 5CIFAR AI Chair now at Deep Mind. Correspondence to: Somjit Nath <somjit.nath.1@ens.etsmtl.ca>. |
| Pseudocode | Yes | Algorithm 1 Object-Centric GVFs (OC-GVFs) |
| Open Source Code | Yes | We include all the relevant hyperparameters and implementation details in Appendix A.4 and open-source the code3. 3https://github.com/Somjit77/oc_gvfs |
| Open Datasets | Yes | Collect-objects Environment: is a customized version of the four-room gridworld environment similar to the one used in Veeriah et al.. Mini Grid-Dynamic Obstacles: For the experiments on non-stationarity, we used the Mini Grid Dynamic Obstacles (Chevalier-Boisvert et al., 2018). Coin Run & Star Pilot: are a part of procedurally generated environments called Proc Gen (Cobbe et al., 2019). |
| Dataset Splits | No | Table 1. Hyper-Parameters of all experiments Environment Algorithm Parameters Encoders and Decoders Slot Attention Parameters Collect Objects train episodes : 5000, batch size : 32, target period : 100, replay capacity : 100000, hidden arch : [64,32], epsilon begin : 1.0, epsilon end : 0.01, epsilon steps : 0.8, discount factor : 0.99, learning rate : 0.0001, eval episodes : 100, evaluate every : 50, num gvfs : 5, unroll steps : 10 |
| Hardware Specification | Yes | All our experiments were run on a single V100 GPU. |
| Software Dependencies | No | The paper does not explicitly list software dependencies with specific version numbers. |
| Experiment Setup | Yes | Table 1. Hyper-Parameters of all experiments Environment Algorithm Parameters Encoders and Decoders Slot Attention Parameters Collect Objects train episodes : 5000, batch size : 32, target period : 100, replay capacity : 100000, hidden arch : [64,32], epsilon begin : 1.0, epsilon end : 0.01, epsilon steps : 0.8, discount factor : 0.99, learning rate : 0.0001, eval episodes : 100, evaluate every : 50, num gvfs : 5, unroll steps : 10 |