Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning
Authors: Haoqi Yuan, Zongqing Lu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies. |
| Researcher Affiliation | Academia | Haoqi Yuan 1 Zongqing Lu 1 School of Computer Science, Peking University. Correspondence to: Zongqing Lu <zongqing.lu@pku.edu.cn>. |
| Pseudocode | Yes | Algorithm 1. Meta Training; Algorithm 2. Meta Test |
| Open Source Code | Yes | The code for our work is available at https://github.com/PKU-AI-Edge/CORRO. |
| Open Datasets | Yes | Point-Robot is a 2D navigation environment introduced in Rakelly et al. (2019). Half-Cheetah-Vel, Ant-Dir are multi-task Mu Jo Co benchmarks where tasks differ in reward functions. Walker-Param, Hopper-Param are multi-task Mu Jo Co benchmarks where tasks differ in transition dynamics. These benchmarks are standard and commonly used in reinforcement learning research. |
| Dataset Splits | No | The paper explicitly mentions '20 training tasks and 20 testing tasks' but does not specify a separate validation dataset split or proportion for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud computing instance types used for running experiments. |
| Software Dependencies | No | The paper mentions implementing with SAC but does not list specific software dependencies with version numbers (e.g., Python version, PyTorch version, specific library versions). |
| Experiment Setup | Yes | In Table 4 and Table 5 (Appendix C), the paper provides detailed configurations and hyperparameters used in dataset collection and meta training, including 'Dataset size', 'Training steps', 'Batch size', 'Network width', 'Network depth', 'Learning rate', 'Latent space dim', 'RL batch size', 'Contrastive batch size', 'Negative pairs number', and 'Encoder width'. |