EqR: Equivariant Representations for Data-Efficient Reinforcement Learning
Authors: Arnab Kumar Mondal, Vineet Jain, Kaleem Siddiqi, Siamak Ravanbakhsh
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the advantages of our method, which we call Equivariant representations for RL (Eq R), for Atari games in a data-efficient setting limited to 100K steps of interactions with the environment. We evaluate our approach, which we call Equivariant representations for RL (Eq R), on the 26 games in the Atari 100K benchmark (Kaiser et al., 2019). We compute the average episodic return (the game score ) at the end of training and normalize it with respect to human scores, as is standard practice. We report the Interquartile Mean (IQM), which is the mean across the middle 50% of the runs, as well as the Optimality Gap. Figure 4 shows performance profiles for our model, Eq R with LR + LGET , along with other comparable methods. Figure 5(a) provides results for different methods on all 26 games. |
| Researcher Affiliation | Academia | Arnab Kumar Mondal 1 2 3 Vineet Jain 1 2 Kaleem Siddiqi 1 2 3 Siamak Ravanbakhsh 1 2 1School of Computer Science, Mc Gill University, Montr eal, Canada 2Mila Quebec Artificial Intelligence Institute, Montr eal, Canada 3Centre for Intelligent Machines, Mc Gill University, Montr eal, Canada. |
| Pseudocode | Yes | Algorithm 1 Equivariant Representations for RL |
| Open Source Code | Yes | Our implementation is available at https://github.com/arnab39/Symmetry-RL. |
| Open Datasets | Yes | We use the sample-efficient Atari suite introduced by Kaiser et al. (2019), which consists of 26 games with only 100,000 environment steps of training data available. |
| Dataset Splits | No | The paper states it uses 100,000 environment steps for training data but does not specify explicit train/validation/test splits of a static dataset by percentage or count. In RL, evaluation is typically done on the environment itself, rather than a pre-split dataset. |
| Hardware Specification | No | The paper mentions "Computational resources were provided by Mila and Compute Canada." This is too general and does not specify any particular GPU/CPU models or other hardware components used for the experiments. |
| Software Dependencies | No | We build our implementation on top of SPR s (Schwarzer et al., 2021), which is based on rlpyt (Stooke & Abbeel, 2019) and Py Torch (Paszke ets al., 2019). While software components are named, specific version numbers for rlpyt and PyTorch are not explicitly provided. |
| Experiment Setup | Yes | Appendix B.6 provides "Hyperparameters for Er Q (including variations) on Atari" in Table 5. This table lists specific parameter settings such as "Learning rate 0.0001", "Minibatch size 32", "Training steps 100K", and various optimizer and RL-specific settings. |