Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Diffusion Spectral Representation for Reinforcement Learning

Authors: Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide comprehensive empirical studies to verify the benefits of Diff-SR in delivering robust and advantageous performance across various benchmarks with both fully and partially observable settings.
Researcher Affiliation	Academia	Dmitry Shribak Georgia Tech EMAIL Chen-Xiao Gao Nanjing University EMAIL Yitong Li Georgia Tech EMAIL Chenjun Xiao CUHK(SZ) EMAIL Bo Dai Georgia Tech EMAIL
Pseudocode	Yes	Algorithm 1 Diffusion Spectral Representation (Diff-SR) Training
Open Source Code	Yes	Our code is publicly released at the project website.
Open Datasets	Yes	We evaluate our method with state-based MDP tasks (Gym-Mu Jo Co locomotion [Todorov et al., 2012]) and image-based POMDP tasks (Meta-World Benchmark [Yu et al., 2020]) in this section.
Dataset Splits	No	The paper describes training and periodic evaluation during the learning process but does not specify explicit training/validation/test dataset splits in the traditional sense, as is common in reinforcement learning where data is collected interactively from an environment.
Hardware Specification	Yes	To showcase this, we record the runtime of Diff-SR and Poly GRAD on MBBL tasks using workstations equipped with Quadro RTX 6000 cards.
Software Dependencies	No	The paper mentions using Dr Q-V2 and MBBL implementations but does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	Table 2: Hyperparameters used for Diff-SR in state-based MDP environments. Hyperparameter Value Actor Learning Rate 0.003 Critic Learning Rate 0.0003 Learning Rate for ψ, ζ, θ 0.0001 Actor Hidden Layer Dimensions (256, 256) Diff-SR Representation Dimension 256 Discount factor γ 0.99 Critic Soft Update Factor τ 0.005 Batch Size 1024 Number of Noise Levels 1000 ψ Network Width 256 ψ Network Hidden Depth 1 ζ Network Width 512 ζ Network Hidden Depth 1