reproducibilityindex.ai

Unsupervised Behavior Extraction via Random Intent Priors

Authors: Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on multiple benchmarks showcase UBER s ability to learn effective and diverse behavior sets that enhance sample efficiency for online RL, outperforming existing baselines. We provide both empirical and theoretical evidence to justify the use of random priors for the reward function.
Researcher Affiliation	Academia	1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Automation, Tsinghua University 3Department of Computer Science & Engineering, Washington University in St. Louis
Pseudocode	Yes	Algorithm 1 Phase 1: Offline Behavior Extraction; Algorithm 2 Phase 2: Online Policy Reuse
Open Source Code	No	The paper mentions open-source implementations for baselines (TD3, TD3+BC, IQL, RLPD, Dr Q-v2) by providing GitHub links in footnotes, but it does not provide an explicit statement or link for the source code of UBER itself.
Open Datasets	Yes	To answer the questions above, we conduct experiments on the standard D4RL benchmark (Fu et al., 2020) and the multi-task benchmark Meta-World (Yu et al., 2020), which encompasses a variety of dataset settings and tasks.
Dataset Splits	No	The paper mentions using standard benchmarks like D4RL and Meta-World, which have predefined splits, but it does not explicitly state the training, validation, or test dataset splits (e.g., percentages or counts) within the text.
Hardware Specification	Yes	All experiments are conducted on the same experimental setup, a single Ge Force RTX 3090 GPU and an Intel Core i7-6700k CPU at 4.00GHz.
Software Dependencies	No	The paper mentions the use of specific algorithms like TD3+BC, TD3, IQL, and RLPD as backbones, and 'Adam' as an optimizer. However, it does not provide specific version numbers for these software components or any other libraries (e.g., PyTorch, TensorFlow, Python version) that would enable reproducible environment setup.
Experiment Setup	Yes	We outline the hyper-parameters used by UBER in Table 4, Table 5 and Table 6. These tables list specific values for 'Optimizer Adam', 'Critic learning rate', 'Actor learning rate', 'Mini-batch size', 'Discount factor', 'Target update rate', 'Policy noise', 'Policy noise clipping', 'TD3+BC parameter α', 'IQL parameter τ', 'RLPD parameter G', 'Ensemble Size', and various network architecture dimensions.