Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Enhancing Interpretability in Deep Reinforcement Learning through Semantic Clustering
Authors: Liang Zhang, Justin Lieffers, Adarsh Pyarelal
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate the effectiveness of the proposed module and demonstrate its ability to reveal semantic clustering properties within DRL. |
| Researcher Affiliation | Academia | Liang Zhang College of Information Science University of Arizona Tucson, AZ 85721 EMAIL |
| Pseudocode | Yes | Algorithm 1: PPO with Semantic Clustering Module (SCM) |
| Open Source Code | Yes | Our code is available at https://github.com/ualiangzhang/semantic_rl. |
| Open Datasets | Yes | Unlike prior work that uses fixed-scene Atari games, we use Procgen1 [19], which offers rich semantic diversity and dynamic, procedurally generated environments. |
| Dataset Splits | Yes | Considering the cost of time and computational resources, we opt for training our model on the full distribution of levels in the easy mode. |
| Hardware Specification | Yes | We train all models on one NVIDIA Tesla V100S 32GB GPU. |
| Software Dependencies | No | The operating system version is Cent OS Linux release 7.9.2009. |
| Experiment Setup | Yes | In Equation 2 of the main paper, wFDR and wVQ-VAE are 500 and 1, respectively. λctrl is updated every 50 iterations according to the following expression: λctrl = min s_mean / 0.8 * s_highest, 1, (A.1) |