Structuring Representations Using Group Invariants
Authors: Mehran Shakerinava, Arnab Kumar Mondal, Siamak Ravanbakhsh
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted many experiments to qualitatively study the representation learned by Sym Reg and its ability to produce a disentangled representation, and quantitatively compare it against simple baselines in representation learning and downstream RL tasks. |
| Researcher Affiliation | Academia | Mc Gill University and Mila, Montréal, Canada {mehran.shakerinava, arnab.mondal, siamak.ravanbakhsh}@mila.quebec |
| Pseudocode | Yes | We provide the algorithm in Appendix C. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating the release of its source code. |
| Open Datasets | Yes | We select the Atari games Pong and Space Invaders as our environments for the world modeling experiments. These environments were previously used by Kipf et al. [31] to evaluate the Contrastive Structured World Model (C-SWM). Next, we consider three Mujoco environments: Inverted Pendulum, Reacher, and Swimmer from Open AI Gym [5] and learn directly from the image observations. |
| Dataset Splits | No | The paper mentions batch sizes (e.g., '64 randomly sampled observations') and the use of environments for evaluation, but it does not specify explicit training/validation/test dataset splits by percentage or sample count. |
| Hardware Specification | No | The paper acknowledges computational resources were provided by 'Mila and Compute Canada' and 'NVIDIA in the form of computational resources', but no specific hardware details such as GPU/CPU models or memory specifications are given. |
| Software Dependencies | No | The paper mentions software components and algorithms like 'PPO) [49] algorithm', 'VAE) [30]', 'C-SWM) [31]', 'Sim CLR [9]', 'Mini World [12]', 'Mujoco environments', and 'Open AI Gym [5]', but it does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | For details on architecture and training, see Appendix G. We use a mini-batch that consists of 64 randomly sampled observations from the environment and their transformations via three randomly sampled actions (4 × 64 samples in total). Each mini-batch consists of 64 random observations and the result of applying all four actions in those states (4 × 64 samples in total). We use random policy to collect trajectories for the pre-training and use Proximal Policy Optimization (PPO) [49] algorithm for the downstream RL task. |