Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

Authors: Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide empirical studies to demonstrate the efficacy of the UCB-based contrastive learning method for RL.
Researcher Affiliation Collaboration 1University of Chicago. 2Northwestern University. 3Shanghai AI Laboratory. 4Yale University.
Pseudocode Yes Algorithm 1 Online Contrastive RL for Single-Agent MDPs
Open Source Code Yes The codes are available at https://github.com/Baichenjia/Contrastive-UCB.
Open Datasets Yes In our experiments, we use Atari 100K (Kaiser et al., 2020) benchmark for evaluation...
Dataset Splits No The paper refers to a 'training stage' and 'testing' of the algorithms, and uses the Atari 100K benchmark, but does not explicitly provide numerical details or methodology for training/test/validation dataset splits within its text.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run its experiments.
Software Dependencies No The paper discusses adopting the 'SPR method' and its architecture but does not specify software dependencies like programming languages or libraries with their version numbers.
Experiment Setup Yes In particular, we adopt the same hyper-parameters as that of SPR (Schwarzer et al., 2021)." and "Meanwhile, we adopt the last layer of the Q-network as our learned representation bϕ which is linear in the estimated Q-function... The bonus for the state-action pair (s, a) is calculated by βk(s, a) = γk [bϕ(s, a) (bΣk h) 1 bϕ(s, a)] 1 2 , where we set the hyperparameter γk = 1 for all iterations k [K].