SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning
Authors: Huy Hoang, Tien Mai, Pradeep Varakantham
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experimental evaluations, we demonstrate that the SPRINQL algorithm achieves state-of-the-art (SOTA) performance on offline IL benchmarks. Code is available at https://github.com/hmhuy0/SPRINQL. |
| Researcher Affiliation | Academia | Huy Hoang Singapore Management University mh.hoang.2024@phdcs.smu.edu.sg Tien Mai Singapore Management University atmai@smu.edu.sg Pradeep Varakantham Singapore Management University pradeepv@smu.edu.sg |
| Pseudocode | Yes | Algorithm 1 SPRINQL: Inverse soft-Q Learning with Sub-optimal Demonstrations |
| Open Source Code | Yes | Code is available at https://github.com/hmhuy0/SPRINQL. |
| Open Datasets | Yes | We test on five Mujoco tasks [32] and four arm manipulation tasks from Panda-gym [12]. |
| Dataset Splits | No | The paper states 'For each seed, we randomly sample subsets of demonstrations for testing' but does not provide specific percentages or counts for training, validation, or test dataset splits. |
| Hardware Specification | Yes | We conducted all experiments on a total of 8 NVIDIA RTX A5000 GPUs and 64 core CPUs. We use 1 GPUs and 8 core CPUs per task with approximately one day per 5 seeds. |
| Software Dependencies | No | The paper mentions various algorithms and network types like 'double Q critic network', 'SAC-based algorithms', and specific baselines such as 'IQ-learn [13]', 'SQIL [29]', 'Demo DICE [21]', and 'DWBC [37]', but it does not specify version numbers for any software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | The detailed hyper-parameters are reported in Table 2. TABLE 2 HYPER PARAMETER BC-BASED SPRINQL ACTOR NETWORK [256,256] [256,256] CRITIC NETWORK [256,256] [256,256] TRAINING STEP 1,000,000 1,000,000 GAMMA 0.99 0.99 LR ACTOR 0.0001 0.00003 LR CRITIC 0.0003 0.0003 LR REWARD REFERENCE 0.0003 0.0003 BATCH SIZE 256 256 SOFT UPDATE CRITIC FACTOR 0.005 SOFT UPDATE ACTOR FACTOR 0.00003 EXPLORATION TEMPERATURE 0.01 REWARD REGULARIZE TEMPERATURE (α) 1.0 CQL TEMPERATURE (β) 1.0 |