reproducibilityindex.ai

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

Authors: Chongyi Zheng, Benjamin Eysenbach, Homer Rich Walke, Patrick Yin, Kuan Fang, Ruslan Salakhutdinov, Sergey Levine

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments start with studying design decisions that drive stable contrastive RL and use simulated and real-world benchmarks to compare contrastive RL to other ofﬂine goal-conditioned policy learning methods, including those that use conditional imitation and employ representation pre-trained by auxiliary objectives. We then analyze unique properties of the representations learned by stable contrastive RL, providing an empirical explanation for the good performances of our method. Finally, we conduct various ablation studies to test the generalizing and scalability of the policy learned by our algorithm. We aim our experiments at answering the following questions:
Researcher Affiliation	Academia	1Carnegie Mellon University 2Princeton University 3UC Berkeley 4University of Washington 5Cornell University
Pseudocode	No	The paper does not include any pseudocode or algorithm blocks. It provides an architecture diagram in Figure 10.
Open Source Code	Yes	We implement stable contrastive RL using Py Torch (Paszke et al., 2019)2. [...] 2https://anonymous.4open.science/r/stable_contrastive_rl-5A42.
Open Datasets	Yes	Our experiments use a suite of simulated and real-world goal-conditioned control tasks based on prior work (Fang et al., 2022a;b; Ebert et al., 2021; Mendonca et al., 2021). [...] We train on an expanded version of the Bridge dataset (Ebert et al., 2021) which entails controlling that robot arm to complete different housekeeping tasks.
Dataset Splits	No	The paper mentions a "held-out validation set" and discusses "training and validation loss", but it does not provide specific percentages or counts for how the dataset was split into training, validation, and test sets. For example: "Given an initial image (Fig. 6 far left) and the desired goal image (Fig. 6 far right), we interpolate between the representations of these two images, and retrieve the nearest neighbor in a held-out validation set."
Hardware Specification	Yes	For each experiment, we allocated 1 NVIDIA V100 GPU and 64 GB of memories to do computation.
Software Dependencies	No	The paper mentions using "Py Torch (Paszke et al., 2019)" and "Adam (Kingma & Ba, 2015) optimizer" but does not provide specific version numbers for PyTorch or any other software libraries or dependencies. The reference to the year of publication for PyTorch is not a version number.
Experiment Setup	Yes	We summarize hyperparameters in Table 2. Table 2 provides explicit hyperparameters such as "batch size 2048", "number of training epochs 300", "image encoder architecture 3-layer CNN kernel size = (8, 4, 3), number of channels = (32, 64, 64), strides = (4, 2, 1), paddings = (2, 1, 1)", "weight initialization for ﬁnal layers of critic and policy UNIF[ 10 12, 10 12]", and "learning rate of Adam (Kingma & Ba, 2015) optimizer 3e-4".