reproducibilityindex.ai

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Authors: Haoqi Yuan, Zongqing Lu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a variety of ofﬂine meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies.
Researcher Affiliation	Academia	Haoqi Yuan 1 Zongqing Lu 1 School of Computer Science, Peking University. Correspondence to: Zongqing Lu <zongqing.lu@pku.edu.cn>.
Pseudocode	Yes	Algorithm 1. Meta Training; Algorithm 2. Meta Test
Open Source Code	Yes	The code for our work is available at https://github.com/PKU-AI-Edge/CORRO.
Open Datasets	Yes	Point-Robot is a 2D navigation environment introduced in Rakelly et al. (2019). Half-Cheetah-Vel, Ant-Dir are multi-task Mu Jo Co benchmarks where tasks differ in reward functions. Walker-Param, Hopper-Param are multi-task Mu Jo Co benchmarks where tasks differ in transition dynamics. These benchmarks are standard and commonly used in reinforcement learning research.
Dataset Splits	No	The paper explicitly mentions '20 training tasks and 20 testing tasks' but does not specify a separate validation dataset split or proportion for hyperparameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud computing instance types used for running experiments.
Software Dependencies	No	The paper mentions implementing with SAC but does not list specific software dependencies with version numbers (e.g., Python version, PyTorch version, specific library versions).
Experiment Setup	Yes	In Table 4 and Table 5 (Appendix C), the paper provides detailed configurations and hyperparameters used in dataset collection and meta training, including 'Dataset size', 'Training steps', 'Batch size', 'Network width', 'Network depth', 'Learning rate', 'Latent space dim', 'RL batch size', 'Contrastive batch size', 'Negative pairs number', and 'Encoder width'.