Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Enhancing Interpretability in Deep Reinforcement Learning through Semantic Clustering

Authors: Liang Zhang, Justin Lieffers, Adarsh Pyarelal

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate the effectiveness of the proposed module and demonstrate its ability to reveal semantic clustering properties within DRL.
Researcher Affiliation	Academia	Liang Zhang College of Information Science University of Arizona Tucson, AZ 85721 EMAIL
Pseudocode	Yes	Algorithm 1: PPO with Semantic Clustering Module (SCM)
Open Source Code	Yes	Our code is available at https://github.com/ualiangzhang/semantic_rl.
Open Datasets	Yes	Unlike prior work that uses fixed-scene Atari games, we use Procgen1 [19], which offers rich semantic diversity and dynamic, procedurally generated environments.
Dataset Splits	Yes	Considering the cost of time and computational resources, we opt for training our model on the full distribution of levels in the easy mode.
Hardware Specification	Yes	We train all models on one NVIDIA Tesla V100S 32GB GPU.
Software Dependencies	No	The operating system version is Cent OS Linux release 7.9.2009.
Experiment Setup	Yes	In Equation 2 of the main paper, wFDR and wVQ-VAE are 500 and 1, respectively. λctrl is updated every 50 iterations according to the following expression: λctrl = min s_mean / 0.8 * s_highest, 1, (A.1)