reproducibilityindex.ai

SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning

Authors: Huy Hoang, Tien Mai, Pradeep Varakantham

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive experimental evaluations, we demonstrate that the SPRINQL algorithm achieves state-of-the-art (SOTA) performance on offline IL benchmarks. Code is available at https://github.com/hmhuy0/SPRINQL.
Researcher Affiliation	Academia	Huy Hoang Singapore Management University mh.hoang.2024@phdcs.smu.edu.sg Tien Mai Singapore Management University atmai@smu.edu.sg Pradeep Varakantham Singapore Management University pradeepv@smu.edu.sg
Pseudocode	Yes	Algorithm 1 SPRINQL: Inverse soft-Q Learning with Sub-optimal Demonstrations
Open Source Code	Yes	Code is available at https://github.com/hmhuy0/SPRINQL.
Open Datasets	Yes	We test on five Mujoco tasks [32] and four arm manipulation tasks from Panda-gym [12].
Dataset Splits	No	The paper states 'For each seed, we randomly sample subsets of demonstrations for testing' but does not provide specific percentages or counts for training, validation, or test dataset splits.
Hardware Specification	Yes	We conducted all experiments on a total of 8 NVIDIA RTX A5000 GPUs and 64 core CPUs. We use 1 GPUs and 8 core CPUs per task with approximately one day per 5 seeds.
Software Dependencies	No	The paper mentions various algorithms and network types like 'double Q critic network', 'SAC-based algorithms', and specific baselines such as 'IQ-learn [13]', 'SQIL [29]', 'Demo DICE [21]', and 'DWBC [37]', but it does not specify version numbers for any software dependencies or libraries used for implementation.
Experiment Setup	Yes	The detailed hyper-parameters are reported in Table 2. TABLE 2 HYPER PARAMETER BC-BASED SPRINQL ACTOR NETWORK [256,256] [256,256] CRITIC NETWORK [256,256] [256,256] TRAINING STEP 1,000,000 1,000,000 GAMMA 0.99 0.99 LR ACTOR 0.0001 0.00003 LR CRITIC 0.0003 0.0003 LR REWARD REFERENCE 0.0003 0.0003 BATCH SIZE 256 256 SOFT UPDATE CRITIC FACTOR 0.005 SOFT UPDATE ACTOR FACTOR 0.00003 EXPLORATION TEMPERATURE 0.01 REWARD REGULARIZE TEMPERATURE (α) 1.0 CQL TEMPERATURE (β) 1.0