Making Linear MDPs Practical via Contrastive Representation Learning

Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph Gonzalez, Dale Schuurmans, Bo Dai

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
Researcher Affiliation Collaboration 1UC Berkeley 2UT Austin 3Google Brain 4University of Alberta.
Pseudocode Yes Algorithm 1 CTRL-UCB: Online Exploration with Representation Learning
Open Source Code No The paper does not provide an explicit statement of code release or a link to a repository for the methodology described.
Open Datasets Yes We test our algorithm extensively on the dense-reward Mu Jo Co benchmark from MBBL. ... We conduct experiments on the Deep Mind Control Suite. ... Lastly, we instantiate our CTRL-LCB algorithm in the offline setting on the D4RL benchmark (Fu et al., 2020).
Dataset Splits No The paper discusses data collection and benchmarks but does not explicitly provide training/validation/test dataset splits with specific percentages or counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper refers to existing software frameworks and libraries (e.g., 'actor-critic algorithm with entropy regularizer'), but it does not list specific version numbers for these or other software dependencies.
Experiment Setup Yes In this section, we list all the hyperparameter and network architecture we use for our experiments. For online Mu Jo Co and DM Control tasks, the hyperparameters can be found at Table 5. ... Table 5. Hyperparameters used for CTRL-UCB in all the environments in Mu Jo Co and DM Control Suite.