reproducibilityindex.ai

Local Feature Swapping for Generalization in Reinforcement Learning

Authors: David Bertoin, Emmanuel Rachelson

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate, on the Open AI Procgen Benchmark, that RL agents trained with the CLOP method exhibit robustness to visual changes and better generalization properties than agents trained using other state-of-the-art regularization techniques. We also demonstrate the effectiveness of CLOP as a general regularization technique in supervised learning.
Researcher Affiliation	Collaboration	David Bertoin IRT Saint-Exup ery ISAE-SUPAERO ANITI Toulouse, France david.bertoin@irt-saintexupery.com; Emmanuel Rachelson ISAE-SUPAERO Universit e de Toulouse ANITI Toulouse, France emmanuel.rachelson@isae-supaero.fr
Pseudocode	Yes	Figure 1: Channel-consistent LOcal Permutation Layer (CLOP)
Open Source Code	Yes	Besides this information, we provide the full source code of our implementation and experiments, along with the data ﬁles of experimental results we obtained.
Open Datasets	Yes	To assess the contribution of the CLOP layer in supervised learning, we ﬁrst train a simple network (three convolutional layers followed by three linear layers) on the MNIST dataset (Le Cun et al., 1998) and evaluate its generalization performance on the USPS (Hull, 1994) dataset... we train a VGG11 network (Simonyan & Zisserman, 2015) using the Imagenette dataset, a subset of ten classes taken from Imagenet (Deng et al., 2009)... We assess the regularization capability of CLOP on the Procgen benchmark, commonly used to test the generalization of RL agents. Procgen is a set of 16 visual games, each allowing procedural generation of game levels.
Dataset Splits	No	The paper specifies training and testing sets but does not explicitly mention validation dataset splits or how they were used for hyperparameter tuning.
Hardware Specification	Yes	All the experiments from Section 5 were run on a desktop machine (Intel i9, 10th generation processor, 32GB RAM) with a single NVIDIA RTX 3080 GPU.
Software Dependencies	No	The paper mentions software like PPO, IMPALA architecture, Adam optimizer, and PyTorch, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We trained all networks for 30 epochs with Adam, a learning rate lr = 5 10 4, and a cosine annealing schedule. We trained all agents with PPO (Schulman et al., 2017) and the hyperparameters recommended by Cobbe et al. (2020). Values of α used for each environment are reported in Table 4.