reproducibilityindex.ai

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

Authors: Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also provide empirical studies to demonstrate the efficacy of the UCB-based contrastive learning method for RL.
Researcher Affiliation	Collaboration	1University of Chicago. 2Northwestern University. 3Shanghai AI Laboratory. 4Yale University.
Pseudocode	Yes	Algorithm 1 Online Contrastive RL for Single-Agent MDPs
Open Source Code	Yes	The codes are available at https://github.com/Baichenjia/Contrastive-UCB.
Open Datasets	Yes	In our experiments, we use Atari 100K (Kaiser et al., 2020) benchmark for evaluation...
Dataset Splits	No	The paper refers to a 'training stage' and 'testing' of the algorithms, and uses the Atari 100K benchmark, but does not explicitly provide numerical details or methodology for training/test/validation dataset splits within its text.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run its experiments.
Software Dependencies	No	The paper discusses adopting the 'SPR method' and its architecture but does not specify software dependencies like programming languages or libraries with their version numbers.
Experiment Setup	Yes	In particular, we adopt the same hyper-parameters as that of SPR (Schwarzer et al., 2021)." and "Meanwhile, we adopt the last layer of the Q-network as our learned representation bϕ which is linear in the estimated Q-function... The bonus for the state-action pair (s, a) is calculated by βk(s, a) = γk [bϕ(s, a) (bΣk h) 1 bϕ(s, a)] 1 2 , where we set the hyperparameter γk = 1 for all iterations k [K].