reproducibilityindex.ai

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

Authors: Qiang Zhang, Tete Xiao, Alexei A Efros, Lerrel Pinto, Xiaolong Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments across a variety of problem domains, both in simulation and on real robot.
Researcher Affiliation	Academia	Qiang Zhang Shanghai Jiao Tong University zhangqiang2016@sjtu.edu.cn Tete Xiao UC Berkeley txiao@eecs.berkeley.edu Alexei A. Efros UC Berkeley efros@eecs.berkeley.edu Lerrel Pinto New York University lerrel@cs.nyu.edu Xiaolong Wang UC San Diego xiw012@ucsd.edu
Pseudocode	Yes	Algorithm 1: Alternatingly Joint Training Algorithm
Open Source Code	No	The paper provides a link for "Video demonstrations of our results" and "More visualizations are presented in the project page link", but it does not state that the source code for the methodology is released or provide a link to a code repository.
Open Datasets	Yes	We choose Mu Jo Co physics simulator as our test bed. ... including four tasks based on Open AI Gym (Brockman et al., 2016), i.e., Half Cheetah , Fetch Reach , Walker and Hopper , and one task based on Deep Mind Control (Tassa et al., 2018), i.e., Finger Spin .
Dataset Splits	No	The paper mentions "To sample the training data, we randomly collect 50k unpaired trajectories in both domain X and domain Y in most settings. The evaluation dataset size is 10k." However, it does not specify a distinct validation set used for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper mentions the use of simulation environments (MuJoCo, OpenAI Gym, DeepMind Control) and a real robot (xArm), but does not specify the hardware used to run these simulations or to train the models (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper mentions several software components like "Mu Jo Co physics simulator", "Open AI Gym", "Deep Mind Control", "Adam optimizer", and algorithms like "DDPG" and "TD3", but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We ﬁrst train the forward dynamics model F for 20 epochs using Adam (Kingma & Ba, 2014) with 0.0001 learning rate. We then train the other networks for 50 epochs with the same learning rate. We set e1 and e2 to 5000 steps in Algorithm 1. ... We set λ0 = 200, λ1 = 1, and λ2 = 0 in Eq. 5 ... We train the models by using the Adam optimizer for 50 epochs with a batch size of 32. The learning rate is set to 0.001 and decreased by 1/3 for every 10 epochs. ... For DDPG, we train the policy for 50 epochs of 400 episodes in each epoch. The policy exploration epsilon ratio is 0.3 and the reward discount factor is 0.98. For TD3, we train the policy for 400k time steps. The initial exploration step is 25k. The reward discount factor is 0.99, the target network update rate is 0.005 and exploration noise standard deviation level is 0.1.