reproducibilityindex.ai

EqR: Equivariant Representations for Data-Efficient Reinforcement Learning

Authors: Arnab Kumar Mondal, Vineet Jain, Kaleem Siddiqi, Siamak Ravanbakhsh

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantages of our method, which we call Equivariant representations for RL (Eq R), for Atari games in a data-efficient setting limited to 100K steps of interactions with the environment. We evaluate our approach, which we call Equivariant representations for RL (Eq R), on the 26 games in the Atari 100K benchmark (Kaiser et al., 2019). We compute the average episodic return (the game score ) at the end of training and normalize it with respect to human scores, as is standard practice. We report the Interquartile Mean (IQM), which is the mean across the middle 50% of the runs, as well as the Optimality Gap. Figure 4 shows performance profiles for our model, Eq R with LR + LGET , along with other comparable methods. Figure 5(a) provides results for different methods on all 26 games.
Researcher Affiliation	Academia	Arnab Kumar Mondal 1 2 3 Vineet Jain 1 2 Kaleem Siddiqi 1 2 3 Siamak Ravanbakhsh 1 2 1School of Computer Science, Mc Gill University, Montr eal, Canada 2Mila Quebec Artificial Intelligence Institute, Montr eal, Canada 3Centre for Intelligent Machines, Mc Gill University, Montr eal, Canada.
Pseudocode	Yes	Algorithm 1 Equivariant Representations for RL
Open Source Code	Yes	Our implementation is available at https://github.com/arnab39/Symmetry-RL.
Open Datasets	Yes	We use the sample-efficient Atari suite introduced by Kaiser et al. (2019), which consists of 26 games with only 100,000 environment steps of training data available.
Dataset Splits	No	The paper states it uses 100,000 environment steps for training data but does not specify explicit train/validation/test splits of a static dataset by percentage or count. In RL, evaluation is typically done on the environment itself, rather than a pre-split dataset.
Hardware Specification	No	The paper mentions "Computational resources were provided by Mila and Compute Canada." This is too general and does not specify any particular GPU/CPU models or other hardware components used for the experiments.
Software Dependencies	No	We build our implementation on top of SPR s (Schwarzer et al., 2021), which is based on rlpyt (Stooke & Abbeel, 2019) and Py Torch (Paszke ets al., 2019). While software components are named, specific version numbers for rlpyt and PyTorch are not explicitly provided.
Experiment Setup	Yes	Appendix B.6 provides "Hyperparameters for Er Q (including variations) on Atari" in Table 5. This table lists specific parameter settings such as "Learning rate 0.0001", "Minibatch size 32", "Training steps 100K", and various optimizer and RL-specific settings.