Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation
Authors: Mohit Prashant, Arvind Easwaran, Suman Das, Michael Yuhas
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our detector by adapting existing benchmarks and compare it with existing OOD detection models for RL. ... Experiments The experiments run in this study are to evaluate the performance of the OOD detector. ... To obtain our results, for each of the Lunar Lander, Ant, Pendulum and Cart Pole environments, we tested our OOD detector in a total of 2000 episodes... |
| Researcher Affiliation | Academia | Mohit Prashant, Arvind Easwaran, Suman Das, Michael Yuhas Nanyang Technological University { mohit010@e., arvinde@, suman.das@, michaelj004@e. } ntu.edu.sg |
| Pseudocode | Yes | Algorithm 1: CVAE Training Algorithm 2: ICP Calibration Algorithm 3: OOD Detection |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | Yes | We utilize the Lunar Lander, Ant, Cart Pole, and Pendulum environments from Open AI Gym (Brockman et al. 2016). |
| Dataset Splits | Yes | for each of the Lunar Lander, Ant, Pendulum and Cart Pole environments, we tested our OOD detector in a total of 2000 episodes; 1000 episodes in which the deployment environments and transition functions were identical to that of training, 500 episodes in which the actions were randomly perturbed and 500 episodes in which the environment parameters were set to random values as described in Table 2. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its experiments (e.g., GPU/CPU models, memory amounts). |
| Software Dependencies | No | The paper mentions 'Open AI Gym' and an 'Advantage Actor-Critic model' but does not provide specific version numbers for these or other software libraries/dependencies. |
| Experiment Setup | Yes | We train our OOD detection system using an Advantage Actor-Critic model (Mnih et al. 2016). ... These results are obtained by setting the CVAE ensemble size to 5 and are described in Table 3... |