Diffusion Spectral Representation for Reinforcement Learning

Authors: Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide comprehensive empirical studies to verify the benefits of Diff-SR in delivering robust and advantageous performance across various benchmarks with both fully and partially observable settings.
Researcher Affiliation Academia Dmitry Shribak Georgia Tech shribak@gatech.edu Chen-Xiao Gao Nanjing University gaocx@lamda.nju.edu.cn Yitong Li Georgia Tech yli3277@gatech.edu Chenjun Xiao CUHK(SZ) chenjunx@cuhk.edu.cn Bo Dai Georgia Tech bodai@cc.gatech.edu
Pseudocode Yes Algorithm 1 Diffusion Spectral Representation (Diff-SR) Training
Open Source Code Yes Our code is publicly released at the project website.
Open Datasets Yes We evaluate our method with state-based MDP tasks (Gym-Mu Jo Co locomotion [Todorov et al., 2012]) and image-based POMDP tasks (Meta-World Benchmark [Yu et al., 2020]) in this section.
Dataset Splits No The paper describes training and periodic evaluation during the learning process but does not specify explicit training/validation/test dataset splits in the traditional sense, as is common in reinforcement learning where data is collected interactively from an environment.
Hardware Specification Yes To showcase this, we record the runtime of Diff-SR and Poly GRAD on MBBL tasks using workstations equipped with Quadro RTX 6000 cards.
Software Dependencies No The paper mentions using Dr Q-V2 and MBBL implementations but does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes Table 2: Hyperparameters used for Diff-SR in state-based MDP environments. Hyperparameter Value Actor Learning Rate 0.003 Critic Learning Rate 0.0003 Learning Rate for ψ, ζ, θ 0.0001 Actor Hidden Layer Dimensions (256, 256) Diff-SR Representation Dimension 256 Discount factor γ 0.99 Critic Soft Update Factor τ 0.005 Batch Size 1024 Number of Noise Levels 1000 ψ Network Width 256 ψ Network Hidden Depth 1 ζ Network Width 512 ζ Network Hidden Depth 1