Unsupervised Behavior Extraction via Random Intent Priors
Authors: Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple benchmarks showcase UBER s ability to learn effective and diverse behavior sets that enhance sample efficiency for online RL, outperforming existing baselines. We provide both empirical and theoretical evidence to justify the use of random priors for the reward function. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Automation, Tsinghua University 3Department of Computer Science & Engineering, Washington University in St. Louis |
| Pseudocode | Yes | Algorithm 1 Phase 1: Offline Behavior Extraction; Algorithm 2 Phase 2: Online Policy Reuse |
| Open Source Code | No | The paper mentions open-source implementations for baselines (TD3, TD3+BC, IQL, RLPD, Dr Q-v2) by providing GitHub links in footnotes, but it does not provide an explicit statement or link for the source code of UBER itself. |
| Open Datasets | Yes | To answer the questions above, we conduct experiments on the standard D4RL benchmark (Fu et al., 2020) and the multi-task benchmark Meta-World (Yu et al., 2020), which encompasses a variety of dataset settings and tasks. |
| Dataset Splits | No | The paper mentions using standard benchmarks like D4RL and Meta-World, which have predefined splits, but it does not explicitly state the training, validation, or test dataset splits (e.g., percentages or counts) within the text. |
| Hardware Specification | Yes | All experiments are conducted on the same experimental setup, a single Ge Force RTX 3090 GPU and an Intel Core i7-6700k CPU at 4.00GHz. |
| Software Dependencies | No | The paper mentions the use of specific algorithms like TD3+BC, TD3, IQL, and RLPD as backbones, and 'Adam' as an optimizer. However, it does not provide specific version numbers for these software components or any other libraries (e.g., PyTorch, TensorFlow, Python version) that would enable reproducible environment setup. |
| Experiment Setup | Yes | We outline the hyper-parameters used by UBER in Table 4, Table 5 and Table 6. These tables list specific values for 'Optimizer Adam', 'Critic learning rate', 'Actor learning rate', 'Mini-batch size', 'Discount factor', 'Target update rate', 'Policy noise', 'Policy noise clipping', 'TD3+BC parameter α', 'IQL parameter τ', 'RLPD parameter G', 'Ensemble Size', and various network architecture dimensions. |