reproducibilityindex.ai

C-Learning: Learning to Achieve Goals via Recursive Classification

Authors: Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that C-learning more accurately estimates the density over future states, while remaining competitive with recent goal-conditioned RL methods across a suite of simulated robotic tasks.
Researcher Affiliation	Collaboration	Benjamin Eysenbach CMU, Google Brain beysenba@cs.cmu.edu Ruslan Salakhutdinov CMU Sergey Levine UC Berkeley, Google Brain
Pseudocode	Yes	Algorithm 1 Monte Carlo C-learning (Page 4); Algorithm 2 Off-Policy C-learning (Page 5); Algorithm 3 Goal-Conditioned C-learning (Page 5)
Open Source Code	Yes	Project website with videos and code: https://ben-eysenbach.github.io/c_learning/
Open Datasets	Yes	We collected a dataset of experience from agents pretrained to solve three locomotion tasks from Open AI Gym. We used the expert data provided for each task in Fu et al. (2020).
Dataset Splits	Yes	We split these trajectories into train (80%) and test (20%) splits. We randomly sampled a 1000 state-action pairs from the validation set and computed the average MSE with the empirical expected future state.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running the experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies	No	The paper mentions that "Our implementation of C-learning is based on the TD3 implementation in Guadarrama et al. (2018)" which refers to "Tf-agents: A library for reinforcement learning in tensorﬂow," implying the use of TensorFlow, but no specific version numbers for TensorFlow or any other software components are provided.
Experiment Setup	Yes	Each of the algorithms used a 2 layer neural network with a hidden layer of size 32, optimized for 1000 iterations using the Adam optimizer with a learning rate of 3e-3 and a batch size of 256. All methods (MC C-learning, TD C-learning, and the 1-step dynamics model) used the same architecture (one hidden layer of size 256 with Re LU activation). Actor network: 2 fully-connected layers of size 256 with Re LU activations. Critic network: 2 fully-connected layers of size 256 with Re LU activations. Replay buffer size: 1e6. Target network updates: Polyak averaging at every iteration with τ = 0.005. Batch size: 256. Optimizer: Adam with a learning rate of 3e-4 and default values for β. Data collection: We collect one transition every one gradient step.