reproducibilityindex.ai

CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing

Authors: Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nüßlein, Claudia Linnhoff-Popien, Thomy Phan

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.
Researcher Affiliation	Academia	Philipp Altmann , Fabian Ritz , Leonard Feuchtinger , Jonas N ußlein , Claudia Linnhoff-Popien and Thomy Phan LMU Munich philipp.altmann@ifi.lmu.de
Pseudocode	Yes	Algorithm 1 CROPed Policy Optimization
Open Source Code	Yes	All implementations for the following evaluations can be found here 1. https://github.com/philippaltmann/CROP
Open Datasets	Yes	To provide proof-of-concept for CROP we used two holey safety gridworlds inspired by [Leike et al., 2017]... For further evaluation and comparisons in section 7 we use (7, 7)- and (11, 11)-sized generated mazes inspired by [Cobbe et al., 2020]
Dataset Splits	No	The paper describes training and testing in different environment configurations (e.g., training in one gridworld, testing in a shifted one; using a pool of random mazes for training/testing), but does not specify explicit train/validation/test percentage splits or sample counts for a single dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	We furthermore built upon the implementations by [Raffin et al., 2021], extending upon [Brockman et al., 2016]. The paper mentions Stable-Baselines3 and OpenAI Gym but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For training PPO, we adopted the default parameters suggested by [Schulman et al., 2017; Raffin et al., 2021]. For Radius CROP we set the radius ρ = (2, 2), resulting in an observation shape of dim(s t ) = ρ 2 + 1 = (5, 5), padded with wall fields. Given the four possible actions A = {Up, Right, Down, Left} we parameterized Action CROP with µ = [( 1, 0), (0, 1), (1, 0), (0, 1)], resulting in an observation shape of dim(s t ) = \|A\| = (4). Regarding Object CROP we chose η = 1 for all safety environments and η = 2 for all mazes and the set of objects to be detected to be all possible objects excluding the agent itself: O = F \ {Agent}, resulting in O = {Wall, Field, Hole, Goal} and the observation shape dim(s t ) = (4, 2) for the train and test environments (cf. Figure 2a and Figure 2b), as well as O = {Wall, Field, Goal} and the observation shape dim(s t ) = (3, 2) for all maze environments (cf. Figure 2c and Figure 2d). All choices above were determined in preliminary experiments, omitted in this work due to limited space.