reproducibilityindex.ai

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Authors: Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that QCS significantly outperforms RCSL and valuebased methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.
Researcher Affiliation	Academia	Jeonghye Kim1, Suyoung Lee1, Woojun Kim2, Youngchul Sung1 1KAIST 2Carnegie Mellon University
Pseudocode	Yes	A Pseudocode Algorithm 1 IQL-aided QCS
Open Source Code	Yes	The project page is available at https://beanie00.com/publications/qcs.
Open Datasets	Yes	Our primary focus was on the D4RL [13] Mu Jo Co, Ant Maze, and Adroit domains.
Dataset Splits	Yes	In all evaluations of QCS, we assess the expert-normalized returns [13] of 10 episodes at each evaluation checkpoint (every 103 gradient steps). Subsequently, we compute the running average of these normalized returns over ten consecutive checkpoints. We report the mean and standard deviations of the final scores across five random seeds.
Hardware Specification	No	The paper mentions training times in Appendix K (e.g., 'IQL 80 min, CQL 220 min, QDT 400 min, and QCS 215 min.') but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions software components like 'Adam [21]' (optimizer), 'Re LU [2]' (nonlinearity function), and implicitly 'PyTorch' (by linking a PyTorch implementation for IQL). However, it does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	Hyperparameters and Backbone Architecture. ... The detailed hyperparameters we used are provided in Appendix J... From Table 14 to 15, we provide detailed hyperparameter settings for actor training. ... Table 16: λ on Mu Jo Co. ... Table 17: λ on Ant Maze. ... Table 18: λ on Adroit.