Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

Authors: Chongyi Zheng, Benjamin Eysenbach, Homer Rich Walke, Patrick Yin, Kuan Fang, Ruslan Salakhutdinov, Sergey Levine

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments start with studying design decisions that drive stable contrastive RL and use simulated and real-world benchmarks to compare contrastive RL to other offline goal-conditioned policy learning methods, including those that use conditional imitation and employ representation pre-trained by auxiliary objectives. We then analyze unique properties of the representations learned by stable contrastive RL, providing an empirical explanation for the good performances of our method. Finally, we conduct various ablation studies to test the generalizing and scalability of the policy learned by our algorithm. We aim our experiments at answering the following questions:
Researcher Affiliation Academia 1Carnegie Mellon University 2Princeton University 3UC Berkeley 4University of Washington 5Cornell University
Pseudocode No The paper does not include any pseudocode or algorithm blocks. It provides an architecture diagram in Figure 10.
Open Source Code Yes We implement stable contrastive RL using Py Torch (Paszke et al., 2019)2. [...] 2https://anonymous.4open.science/r/stable_contrastive_rl-5A42.
Open Datasets Yes Our experiments use a suite of simulated and real-world goal-conditioned control tasks based on prior work (Fang et al., 2022a;b; Ebert et al., 2021; Mendonca et al., 2021). [...] We train on an expanded version of the Bridge dataset (Ebert et al., 2021) which entails controlling that robot arm to complete different housekeeping tasks.
Dataset Splits No The paper mentions a "held-out validation set" and discusses "training and validation loss", but it does not provide specific percentages or counts for how the dataset was split into training, validation, and test sets. For example: "Given an initial image (Fig. 6 far left) and the desired goal image (Fig. 6 far right), we interpolate between the representations of these two images, and retrieve the nearest neighbor in a held-out validation set."
Hardware Specification Yes For each experiment, we allocated 1 NVIDIA V100 GPU and 64 GB of memories to do computation.
Software Dependencies No The paper mentions using "Py Torch (Paszke et al., 2019)" and "Adam (Kingma & Ba, 2015) optimizer" but does not provide specific version numbers for PyTorch or any other software libraries or dependencies. The reference to the year of publication for PyTorch is not a version number.
Experiment Setup Yes We summarize hyperparameters in Table 2. Table 2 provides explicit hyperparameters such as "batch size 2048", "number of training epochs 300", "image encoder architecture 3-layer CNN kernel size = (8, 4, 3), number of channels = (32, 64, 64), strides = (4, 2, 1), paddings = (2, 1, 1)", "weight initialization for final layers of critic and policy UNIF[ 10 12, 10 12]", and "learning rate of Adam (Kingma & Ba, 2015) optimizer 3e-4".