Local Feature Swapping for Generalization in Reinforcement Learning
Authors: David Bertoin, Emmanuel Rachelson
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate, on the Open AI Procgen Benchmark, that RL agents trained with the CLOP method exhibit robustness to visual changes and better generalization properties than agents trained using other state-of-the-art regularization techniques. We also demonstrate the effectiveness of CLOP as a general regularization technique in supervised learning. |
| Researcher Affiliation | Collaboration | David Bertoin IRT Saint-Exup ery ISAE-SUPAERO ANITI Toulouse, France david.bertoin@irt-saintexupery.com; Emmanuel Rachelson ISAE-SUPAERO Universit e de Toulouse ANITI Toulouse, France emmanuel.rachelson@isae-supaero.fr |
| Pseudocode | Yes | Figure 1: Channel-consistent LOcal Permutation Layer (CLOP) |
| Open Source Code | Yes | Besides this information, we provide the full source code of our implementation and experiments, along with the data files of experimental results we obtained. |
| Open Datasets | Yes | To assess the contribution of the CLOP layer in supervised learning, we first train a simple network (three convolutional layers followed by three linear layers) on the MNIST dataset (Le Cun et al., 1998) and evaluate its generalization performance on the USPS (Hull, 1994) dataset... we train a VGG11 network (Simonyan & Zisserman, 2015) using the Imagenette dataset, a subset of ten classes taken from Imagenet (Deng et al., 2009)... We assess the regularization capability of CLOP on the Procgen benchmark, commonly used to test the generalization of RL agents. Procgen is a set of 16 visual games, each allowing procedural generation of game levels. |
| Dataset Splits | No | The paper specifies training and testing sets but does not explicitly mention validation dataset splits or how they were used for hyperparameter tuning. |
| Hardware Specification | Yes | All the experiments from Section 5 were run on a desktop machine (Intel i9, 10th generation processor, 32GB RAM) with a single NVIDIA RTX 3080 GPU. |
| Software Dependencies | No | The paper mentions software like PPO, IMPALA architecture, Adam optimizer, and PyTorch, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We trained all networks for 30 epochs with Adam, a learning rate lr = 5 10 4, and a cosine annealing schedule. We trained all agents with PPO (Schulman et al., 2017) and the hyperparameters recommended by Cobbe et al. (2020). Values of α used for each environment are reported in Table 4. |