Structuring Representations Using Group Invariants

Authors: Mehran Shakerinava, Arnab Kumar Mondal, Siamak Ravanbakhsh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted many experiments to qualitatively study the representation learned by Sym Reg and its ability to produce a disentangled representation, and quantitatively compare it against simple baselines in representation learning and downstream RL tasks.
Researcher Affiliation Academia Mc Gill University and Mila, Montréal, Canada {mehran.shakerinava, arnab.mondal, siamak.ravanbakhsh}@mila.quebec
Pseudocode Yes We provide the algorithm in Appendix C.
Open Source Code No The paper does not provide an explicit statement or link indicating the release of its source code.
Open Datasets Yes We select the Atari games Pong and Space Invaders as our environments for the world modeling experiments. These environments were previously used by Kipf et al. [31] to evaluate the Contrastive Structured World Model (C-SWM). Next, we consider three Mujoco environments: Inverted Pendulum, Reacher, and Swimmer from Open AI Gym [5] and learn directly from the image observations.
Dataset Splits No The paper mentions batch sizes (e.g., '64 randomly sampled observations') and the use of environments for evaluation, but it does not specify explicit training/validation/test dataset splits by percentage or sample count.
Hardware Specification No The paper acknowledges computational resources were provided by 'Mila and Compute Canada' and 'NVIDIA in the form of computational resources', but no specific hardware details such as GPU/CPU models or memory specifications are given.
Software Dependencies No The paper mentions software components and algorithms like 'PPO) [49] algorithm', 'VAE) [30]', 'C-SWM) [31]', 'Sim CLR [9]', 'Mini World [12]', 'Mujoco environments', and 'Open AI Gym [5]', but it does not provide specific version numbers for any of them.
Experiment Setup Yes For details on architecture and training, see Appendix G. We use a mini-batch that consists of 64 randomly sampled observations from the environment and their transformations via three randomly sampled actions (4 × 64 samples in total). Each mini-batch consists of 64 random observations and the result of applying all four actions in those states (4 × 64 samples in total). We use random policy to collect trajectories for the pre-training and use Proximal Policy Optimization (PPO) [49] algorithm for the downstream RL task.