Diffusion Spectral Representation for Reinforcement Learning
Authors: Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide comprehensive empirical studies to verify the benefits of Diff-SR in delivering robust and advantageous performance across various benchmarks with both fully and partially observable settings. |
| Researcher Affiliation | Academia | Dmitry Shribak Georgia Tech shribak@gatech.edu Chen-Xiao Gao Nanjing University gaocx@lamda.nju.edu.cn Yitong Li Georgia Tech yli3277@gatech.edu Chenjun Xiao CUHK(SZ) chenjunx@cuhk.edu.cn Bo Dai Georgia Tech bodai@cc.gatech.edu |
| Pseudocode | Yes | Algorithm 1 Diffusion Spectral Representation (Diff-SR) Training |
| Open Source Code | Yes | Our code is publicly released at the project website. |
| Open Datasets | Yes | We evaluate our method with state-based MDP tasks (Gym-Mu Jo Co locomotion [Todorov et al., 2012]) and image-based POMDP tasks (Meta-World Benchmark [Yu et al., 2020]) in this section. |
| Dataset Splits | No | The paper describes training and periodic evaluation during the learning process but does not specify explicit training/validation/test dataset splits in the traditional sense, as is common in reinforcement learning where data is collected interactively from an environment. |
| Hardware Specification | Yes | To showcase this, we record the runtime of Diff-SR and Poly GRAD on MBBL tasks using workstations equipped with Quadro RTX 6000 cards. |
| Software Dependencies | No | The paper mentions using Dr Q-V2 and MBBL implementations but does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | Table 2: Hyperparameters used for Diff-SR in state-based MDP environments. Hyperparameter Value Actor Learning Rate 0.003 Critic Learning Rate 0.0003 Learning Rate for ψ, ζ, θ 0.0001 Actor Hidden Layer Dimensions (256, 256) Diff-SR Representation Dimension 256 Discount factor γ 0.99 Critic Soft Update Factor τ 0.005 Batch Size 1024 Number of Noise Levels 1000 ψ Network Width 256 ψ Network Hidden Depth 1 ζ Network Width 512 ζ Network Hidden Depth 1 |