Multi-task Batch Reinforcement Learning with Metric Learning
Authors: Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Henrik Christensen, Hao Su
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the performance of our proposed algorithm (Sec. 5.1) and ablate the different design choices (Sec. 5.2). Sec. 5.3 shows that the multi-task policy can serve as a good initialization, significantly speeding up training on unseen tasks. |
| Researcher Affiliation | Collaboration | Jiachen Li1 Quan Vuong1 Shuang Liu1 Minghua Liu1 Kamil Ciosek2 Henrik Christensen1 Hao Su1 1UC San Diego 2 Microsoft Research Cambridge, UK {jil021, qvuong, s3liu, minghua, hichristensen, haosu}@ucsd.edu kamil.ciosek@microsoft.com |
| Pseudocode | Yes | Alg. 1 illustrates the pseudo-code for the second phase of the distillation procedure. Detailed pseudocode of the two-phases distillation procedures can be found in Appendix E. |
| Open Source Code | Yes | Website: https://sites.google.com/eng.ucsd.edu/multi-task-batch-reinforcement/home |
| Open Datasets | Yes | We evaluate in five challenging task distributions from Mu Jo Co [38] and a modified task distribution Umaze Goal-M from D4RL [39]. |
| Dataset Splits | No | The paper discusses training and testing on unseen tasks but does not specify a validation dataset split or strategy for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'Soft Actor Critic (SAC)', 'BCQ', and 'Open AI gym state' but does not specify their version numbers or other software dependencies with versions. |
| Experiment Setup | Yes | Appendix C provides all hyper-parameters. |