Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing
Authors: Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nüßlein, Claudia Linnhoff-Popien, Thomy Phan
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes. |
| Researcher Affiliation | Academia | Philipp Altmann , Fabian Ritz , Leonard Feuchtinger , Jonas N ußlein , Claudia Linnhoff-Popien and Thomy Phan LMU Munich EMAIL |
| Pseudocode | Yes | Algorithm 1 CROPed Policy Optimization |
| Open Source Code | Yes | All implementations for the following evaluations can be found here 1. https://github.com/philippaltmann/CROP |
| Open Datasets | Yes | To provide proof-of-concept for CROP we used two holey safety gridworlds inspired by [Leike et al., 2017]... For further evaluation and comparisons in section 7 we use (7, 7)- and (11, 11)-sized generated mazes inspired by [Cobbe et al., 2020] |
| Dataset Splits | No | The paper describes training and testing in different environment configurations (e.g., training in one gridworld, testing in a shifted one; using a pool of random mazes for training/testing), but does not specify explicit train/validation/test percentage splits or sample counts for a single dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | We furthermore built upon the implementations by [Raffin et al., 2021], extending upon [Brockman et al., 2016]. The paper mentions Stable-Baselines3 and OpenAI Gym but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For training PPO, we adopted the default parameters suggested by [Schulman et al., 2017; Raffin et al., 2021]. For Radius CROP we set the radius ρ = (2, 2), resulting in an observation shape of dim(s t ) = ρ 2 + 1 = (5, 5), padded with wall fields. Given the four possible actions A = {Up, Right, Down, Left} we parameterized Action CROP with µ = [( 1, 0), (0, 1), (1, 0), (0, 1)], resulting in an observation shape of dim(s t ) = |A| = (4). Regarding Object CROP we chose η = 1 for all safety environments and η = 2 for all mazes and the set of objects to be detected to be all possible objects excluding the agent itself: O = F \ {Agent}, resulting in O = {Wall, Field, Hole, Goal} and the observation shape dim(s t ) = (4, 2) for the train and test environments (cf. Figure 2a and Figure 2b), as well as O = {Wall, Field, Goal} and the observation shape dim(s t ) = (3, 2) for all maze environments (cf. Figure 2c and Figure 2d). All choices above were determined in preliminary experiments, omitted in this work due to limited space. |