Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
KIPPO: Koopman-Inspired Proximal Policy Optimization
Authors: Andrei Cozma, Landon Harris, Hairong Qi
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate consistent improvements over the PPO baseline with 6 60% increased performance while reducing variability by up to 91% when evaluated on various continuous control tasks. |
| Researcher Affiliation | Academia | Andrei Cozma , Landon Harris and Hairong Qi University of Tennessee, Knoxville EMAIL, EMAIL |
| Pseudocode | No | We refer readers to the supplementary materials for complete implementation details and pseudocode. |
| Open Source Code | Yes | Extended version with comprehensive appendices containing ablation studies, hyperparameter analyses, pseudocode, and implementation details is available at: https://andreicozma.com/KIPPO. |
| Open Datasets | Yes | We evaluate six continuous control environments from Gymnasium [Towers et al., 2023] using Mu Jo Co [Todorov et al., 2012] and Box2D [Catto, 2007] |
| Dataset Splits | No | The paper does not describe traditional training/test/validation dataset splits for static datasets, as it uses reinforcement learning environments where data is generated dynamically through interaction. It mentions mini-batches for optimization: "The algorithm divides 2,048 steps into 32 mini-batches". |
| Hardware Specification | No | Hardware specifications and reference runtime are provided in the supplementary material. |
| Software Dependencies | No | The paper mentions using "PPO and RPO implementations from the Clean RL library [Huang et al., 2022]" but does not specify version numbers for Clean RL or other software components. |
| Experiment Setup | Yes | Each rollout phase collects 2,048 environment steps across multiple trajectories... The algorithm divides 2,048 steps into 32 mini-batches... The optimization process runs for 10 epochs... Each training run consists of exactly 1 million environment steps. |