Unsupervised Behavior Extraction via Random Intent Priors

Authors: Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple benchmarks showcase UBER s ability to learn effective and diverse behavior sets that enhance sample efficiency for online RL, outperforming existing baselines. We provide both empirical and theoretical evidence to justify the use of random priors for the reward function.
Researcher Affiliation Academia 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Automation, Tsinghua University 3Department of Computer Science & Engineering, Washington University in St. Louis
Pseudocode Yes Algorithm 1 Phase 1: Offline Behavior Extraction; Algorithm 2 Phase 2: Online Policy Reuse
Open Source Code No The paper mentions open-source implementations for baselines (TD3, TD3+BC, IQL, RLPD, Dr Q-v2) by providing GitHub links in footnotes, but it does not provide an explicit statement or link for the source code of UBER itself.
Open Datasets Yes To answer the questions above, we conduct experiments on the standard D4RL benchmark (Fu et al., 2020) and the multi-task benchmark Meta-World (Yu et al., 2020), which encompasses a variety of dataset settings and tasks.
Dataset Splits No The paper mentions using standard benchmarks like D4RL and Meta-World, which have predefined splits, but it does not explicitly state the training, validation, or test dataset splits (e.g., percentages or counts) within the text.
Hardware Specification Yes All experiments are conducted on the same experimental setup, a single Ge Force RTX 3090 GPU and an Intel Core i7-6700k CPU at 4.00GHz.
Software Dependencies No The paper mentions the use of specific algorithms like TD3+BC, TD3, IQL, and RLPD as backbones, and 'Adam' as an optimizer. However, it does not provide specific version numbers for these software components or any other libraries (e.g., PyTorch, TensorFlow, Python version) that would enable reproducible environment setup.
Experiment Setup Yes We outline the hyper-parameters used by UBER in Table 4, Table 5 and Table 6. These tables list specific values for 'Optimizer Adam', 'Critic learning rate', 'Actor learning rate', 'Mini-batch size', 'Discount factor', 'Target update rate', 'Policy noise', 'Policy noise clipping', 'TD3+BC parameter α', 'IQL parameter τ', 'RLPD parameter G', 'Ensemble Size', and various network architecture dimensions.