reproducibilityindex.ai

Contrastive Learning as Goal-Conditioned Reinforcement Learning

Authors: Benjamin Eysenbach, Tianjun Zhang, Sergey Levine, Russ R. Salakhutdinov

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across a range of goal-conditioned RL tasks, we demonstrate that contrastive RL methods achieve higher success rates than prior non-contrastive methods, including in the offline RL setting. We also show that contrastive RL outperforms prior methods on image-based tasks, without using data augmentation or auxiliary objectives. 1
Researcher Affiliation	Collaboration	Benjamin Eysenbachα,β Tianjun Zhangγ Sergey Levineβ,γ Ruslan Salakhutdinovα αCMU βGoogle Research γUC Berkeley
Pseudocode	Yes	Alg. 1 provides a JAX [13] implementation of the actor and critic losses.
Open Source Code	Yes	1Project website with videos and code: https://ben-eysenbach.github.io/contrastive_rl
Open Datasets	Yes	We use the benchmark Ant Maze tasks from the D4RL benchmark [36]
Dataset Splits	No	The paper mentions using a replay buffer, environment steps, and batch sizes for training. For the offline RL setting, it uses the D4RL benchmark, but it does not explicitly state specific training, validation, and test dataset splits (e.g., percentages or sample counts) within the text.
Hardware Specification	Yes	On a single TPUv2, training proceeds at 1100 batches/sec for state-based tasks and 105 batches/sec for image-based tasks; for comparison, our implementation of Dr Q on the same hardware setup runs at 28 batches/sec (3.9 slower).
Software Dependencies	No	The paper states that its implementation is based on JAX [13] and ACME [57], but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Architectures and hyperparameters are described in Appendix E.7; We use a replay buffer size of 106 for all tasks. For state-based tasks, the training proceeds for 3 million environment steps. For image-based tasks, training proceeds for 1 million environment steps. Each policy update uses a batch size of 256. For state-based tasks, we take 1000 critic steps and 1000 actor steps. For image-based tasks, we take 250 critic steps and 250 actor steps.