Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Authors: Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that QCS significantly outperforms RCSL and valuebased methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.
Researcher Affiliation Academia Jeonghye Kim1, Suyoung Lee1, Woojun Kim2, Youngchul Sung1 1KAIST 2Carnegie Mellon University
Pseudocode Yes A Pseudocode Algorithm 1 IQL-aided QCS
Open Source Code Yes The project page is available at https://beanie00.com/publications/qcs.
Open Datasets Yes Our primary focus was on the D4RL [13] Mu Jo Co, Ant Maze, and Adroit domains.
Dataset Splits Yes In all evaluations of QCS, we assess the expert-normalized returns [13] of 10 episodes at each evaluation checkpoint (every 103 gradient steps). Subsequently, we compute the running average of these normalized returns over ten consecutive checkpoints. We report the mean and standard deviations of the final scores across five random seeds.
Hardware Specification No The paper mentions training times in Appendix K (e.g., 'IQL 80 min, CQL 220 min, QDT 400 min, and QCS 215 min.') but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies No The paper mentions software components like 'Adam [21]' (optimizer), 'Re LU [2]' (nonlinearity function), and implicitly 'PyTorch' (by linking a PyTorch implementation for IQL). However, it does not provide specific version numbers for these software components or libraries.
Experiment Setup Yes Hyperparameters and Backbone Architecture. ... The detailed hyperparameters we used are provided in Appendix J... From Table 14 to 15, we provide detailed hyperparameter settings for actor training. ... Table 16: λ on Mu Jo Co. ... Table 17: λ on Ant Maze. ... Table 18: λ on Adroit.