Making Linear MDPs Practical via Contrastive Representation Learning
Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph Gonzalez, Dale Schuurmans, Bo Dai
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks. |
| Researcher Affiliation | Collaboration | 1UC Berkeley 2UT Austin 3Google Brain 4University of Alberta. |
| Pseudocode | Yes | Algorithm 1 CTRL-UCB: Online Exploration with Representation Learning |
| Open Source Code | No | The paper does not provide an explicit statement of code release or a link to a repository for the methodology described. |
| Open Datasets | Yes | We test our algorithm extensively on the dense-reward Mu Jo Co benchmark from MBBL. ... We conduct experiments on the Deep Mind Control Suite. ... Lastly, we instantiate our CTRL-LCB algorithm in the offline setting on the D4RL benchmark (Fu et al., 2020). |
| Dataset Splits | No | The paper discusses data collection and benchmarks but does not explicitly provide training/validation/test dataset splits with specific percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper refers to existing software frameworks and libraries (e.g., 'actor-critic algorithm with entropy regularizer'), but it does not list specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In this section, we list all the hyperparameter and network architecture we use for our experiments. For online Mu Jo Co and DM Control tasks, the hyperparameters can be found at Table 5. ... Table 5. Hyperparameters used for CTRL-UCB in all the environments in Mu Jo Co and DM Control Suite. |