MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Authors: Elise van der Pol, Daniel Worrall, Herke van Hoof, Frans Oliehoek, Max Welling

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that such networks converge faster than unstructured baselines on Cart Pole, a grid world and Pong. ... We evaluated three flavors of MDP homomorphic network an MLP, a CNN, and an equivariant feature extractor on three RL tasks that exhibit group symmetry: Cart Pole, a grid world, and Pong. ... We show training curves for Cart Pole in 4a-4b, Pong in Figure 4c and for the grid world in Figure 6.
Researcher Affiliation Collaboration Elise van der Pol Uv A-Bosch Deltalab University of Amsterdam e.e.vanderpol@uva.nl Daniel E. Worrall Philips Lab University of Amsterdam d.e.worrall@uva.nl Herke van Hoof Uv A-Bosch Deltalab University of Amsterdam h.c.vanhoof@uva.nl Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology f.a.oliehoek@tudelft.nl Max Welling Uv A-Bosch Deltalab University of Amsterdam m.welling@uva.nl
Pseudocode Yes Algorithm 1 Equivariant layer construction
Open Source Code Yes Code is available 4. https://github.com/Elisevander Pol/symmetrizer/
Open Datasets Yes We used Open AI s Cartpole-v1 [7] implementation... We evaluated on the RLPYT [36] implementation of Pong.
Dataset Splits No The paper does not specify distinct training, validation, and test dataset splits with percentages or counts, as is common for static datasets. While experiments involve multiple random seeds for evaluation, this is not equivalent to a validation dataset split.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for experiments.
Software Dependencies No The paper mentions using 'Pytorch [29]' and 'RLPYT [36]' but does not provide specific version numbers for these software dependencies.
Experiment Setup No The paper states that 'Hyperparameters (and the range considered), architectures, and group implementation details are in the Supplementary Material.' However, these specific details are not present in the provided main text.