Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning
Authors: Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that QCS significantly outperforms RCSL and valuebased methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks. |
| Researcher Affiliation | Academia | Jeonghye Kim1, Suyoung Lee1, Woojun Kim2, Youngchul Sung1 1KAIST 2Carnegie Mellon University |
| Pseudocode | Yes | A Pseudocode Algorithm 1 IQL-aided QCS |
| Open Source Code | Yes | The project page is available at https://beanie00.com/publications/qcs. |
| Open Datasets | Yes | Our primary focus was on the D4RL [13] Mu Jo Co, Ant Maze, and Adroit domains. |
| Dataset Splits | Yes | In all evaluations of QCS, we assess the expert-normalized returns [13] of 10 episodes at each evaluation checkpoint (every 103 gradient steps). Subsequently, we compute the running average of these normalized returns over ten consecutive checkpoints. We report the mean and standard deviations of the final scores across five random seeds. |
| Hardware Specification | No | The paper mentions training times in Appendix K (e.g., 'IQL 80 min, CQL 220 min, QDT 400 min, and QCS 215 min.') but does not specify any hardware details like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions software components like 'Adam [21]' (optimizer), 'Re LU [2]' (nonlinearity function), and implicitly 'PyTorch' (by linking a PyTorch implementation for IQL). However, it does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | Hyperparameters and Backbone Architecture. ... The detailed hyperparameters we used are provided in Appendix J... From Table 14 to 15, we provide detailed hyperparameter settings for actor training. ... Table 16: λ on Mu Jo Co. ... Table 17: λ on Ant Maze. ... Table 18: λ on Adroit. |