Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Authors: Haoqi Yuan, Zongqing Lu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies.
Researcher Affiliation Academia Haoqi Yuan 1 Zongqing Lu 1 School of Computer Science, Peking University. Correspondence to: Zongqing Lu <zongqing.lu@pku.edu.cn>.
Pseudocode Yes Algorithm 1. Meta Training; Algorithm 2. Meta Test
Open Source Code Yes The code for our work is available at https://github.com/PKU-AI-Edge/CORRO.
Open Datasets Yes Point-Robot is a 2D navigation environment introduced in Rakelly et al. (2019). Half-Cheetah-Vel, Ant-Dir are multi-task Mu Jo Co benchmarks where tasks differ in reward functions. Walker-Param, Hopper-Param are multi-task Mu Jo Co benchmarks where tasks differ in transition dynamics. These benchmarks are standard and commonly used in reinforcement learning research.
Dataset Splits No The paper explicitly mentions '20 training tasks and 20 testing tasks' but does not specify a separate validation dataset split or proportion for hyperparameter tuning.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud computing instance types used for running experiments.
Software Dependencies No The paper mentions implementing with SAC but does not list specific software dependencies with version numbers (e.g., Python version, PyTorch version, specific library versions).
Experiment Setup Yes In Table 4 and Table 5 (Appendix C), the paper provides detailed configurations and hyperparameters used in dataset collection and meta training, including 'Dataset size', 'Training steps', 'Batch size', 'Network width', 'Network depth', 'Learning rate', 'Latent space dim', 'RL batch size', 'Contrastive batch size', 'Negative pairs number', and 'Encoder width'.