Local Feature Swapping for Generalization in Reinforcement Learning

Authors: David Bertoin, Emmanuel Rachelson

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate, on the Open AI Procgen Benchmark, that RL agents trained with the CLOP method exhibit robustness to visual changes and better generalization properties than agents trained using other state-of-the-art regularization techniques. We also demonstrate the effectiveness of CLOP as a general regularization technique in supervised learning.
Researcher Affiliation Collaboration David Bertoin IRT Saint-Exup ery ISAE-SUPAERO ANITI Toulouse, France david.bertoin@irt-saintexupery.com; Emmanuel Rachelson ISAE-SUPAERO Universit e de Toulouse ANITI Toulouse, France emmanuel.rachelson@isae-supaero.fr
Pseudocode Yes Figure 1: Channel-consistent LOcal Permutation Layer (CLOP)
Open Source Code Yes Besides this information, we provide the full source code of our implementation and experiments, along with the data files of experimental results we obtained.
Open Datasets Yes To assess the contribution of the CLOP layer in supervised learning, we first train a simple network (three convolutional layers followed by three linear layers) on the MNIST dataset (Le Cun et al., 1998) and evaluate its generalization performance on the USPS (Hull, 1994) dataset... we train a VGG11 network (Simonyan & Zisserman, 2015) using the Imagenette dataset, a subset of ten classes taken from Imagenet (Deng et al., 2009)... We assess the regularization capability of CLOP on the Procgen benchmark, commonly used to test the generalization of RL agents. Procgen is a set of 16 visual games, each allowing procedural generation of game levels.
Dataset Splits No The paper specifies training and testing sets but does not explicitly mention validation dataset splits or how they were used for hyperparameter tuning.
Hardware Specification Yes All the experiments from Section 5 were run on a desktop machine (Intel i9, 10th generation processor, 32GB RAM) with a single NVIDIA RTX 3080 GPU.
Software Dependencies No The paper mentions software like PPO, IMPALA architecture, Adam optimizer, and PyTorch, but does not provide specific version numbers for these software components.
Experiment Setup Yes We trained all networks for 30 epochs with Adam, a learning rate lr = 5 10 4, and a cosine annealing schedule. We trained all agents with PPO (Schulman et al., 2017) and the hyperparameters recommended by Cobbe et al. (2020). Values of α used for each environment are reported in Table 4.