Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Diffusion Spectral Representation for Reinforcement Learning
Authors: Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide comprehensive empirical studies to verify the benefits of Diff-SR in delivering robust and advantageous performance across various benchmarks with both fully and partially observable settings. |
| Researcher Affiliation | Academia | Dmitry Shribak Georgia Tech EMAIL Chen-Xiao Gao Nanjing University EMAIL Yitong Li Georgia Tech EMAIL Chenjun Xiao CUHK(SZ) EMAIL Bo Dai Georgia Tech EMAIL |
| Pseudocode | Yes | Algorithm 1 Diffusion Spectral Representation (Diff-SR) Training |
| Open Source Code | Yes | Our code is publicly released at the project website. |
| Open Datasets | Yes | We evaluate our method with state-based MDP tasks (Gym-Mu Jo Co locomotion [Todorov et al., 2012]) and image-based POMDP tasks (Meta-World Benchmark [Yu et al., 2020]) in this section. |
| Dataset Splits | No | The paper describes training and periodic evaluation during the learning process but does not specify explicit training/validation/test dataset splits in the traditional sense, as is common in reinforcement learning where data is collected interactively from an environment. |
| Hardware Specification | Yes | To showcase this, we record the runtime of Diff-SR and Poly GRAD on MBBL tasks using workstations equipped with Quadro RTX 6000 cards. |
| Software Dependencies | No | The paper mentions using Dr Q-V2 and MBBL implementations but does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | Table 2: Hyperparameters used for Diff-SR in state-based MDP environments. Hyperparameter Value Actor Learning Rate 0.003 Critic Learning Rate 0.0003 Learning Rate for ψ, ζ, θ 0.0001 Actor Hidden Layer Dimensions (256, 256) Diff-SR Representation Dimension 256 Discount factor γ 0.99 Critic Soft Update Factor τ 0.005 Batch Size 1024 Number of Noise Levels 1000 ψ Network Width 256 ψ Network Hidden Depth 1 ζ Network Width 512 ζ Network Hidden Depth 1 |