Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation

Authors: Mohit Prashant, Arvind Easwaran, Suman Das, Michael Yuhas

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our detector by adapting existing benchmarks and compare it with existing OOD detection models for RL. ... Experiments The experiments run in this study are to evaluate the performance of the OOD detector. ... To obtain our results, for each of the Lunar Lander, Ant, Pendulum and Cart Pole environments, we tested our OOD detector in a total of 2000 episodes...
Researcher Affiliation	Academia	Mohit Prashant, Arvind Easwaran, Suman Das, Michael Yuhas Nanyang Technological University { mohit010@e., arvinde@, suman.das@, michaelj004@e. } ntu.edu.sg
Pseudocode	Yes	Algorithm 1: CVAE Training Algorithm 2: ICP Calibration Algorithm 3: OOD Detection
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	We utilize the Lunar Lander, Ant, Cart Pole, and Pendulum environments from Open AI Gym (Brockman et al. 2016).
Dataset Splits	Yes	for each of the Lunar Lander, Ant, Pendulum and Cart Pole environments, we tested our OOD detector in a total of 2000 episodes; 1000 episodes in which the deployment environments and transition functions were identical to that of training, 500 episodes in which the actions were randomly perturbed and 500 episodes in which the environment parameters were set to random values as described in Table 2.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run its experiments (e.g., GPU/CPU models, memory amounts).
Software Dependencies	No	The paper mentions 'Open AI Gym' and an 'Advantage Actor-Critic model' but does not provide specific version numbers for these or other software libraries/dependencies.
Experiment Setup	Yes	We train our OOD detection system using an Advantage Actor-Critic model (Mnih et al. 2016). ... These results are obtained by setting the CVAE ensemble size to 5 and are described in Table 3...