Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency
Authors: Qiang Zhang, Tete Xiao, Alexei A Efros, Lerrel Pinto, Xiaolong Wang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments across a variety of problem domains, both in simulation and on real robot. |
| Researcher Affiliation | Academia | Qiang Zhang Shanghai Jiao Tong University zhangqiang2016@sjtu.edu.cn Tete Xiao UC Berkeley txiao@eecs.berkeley.edu Alexei A. Efros UC Berkeley efros@eecs.berkeley.edu Lerrel Pinto New York University lerrel@cs.nyu.edu Xiaolong Wang UC San Diego xiw012@ucsd.edu |
| Pseudocode | Yes | Algorithm 1: Alternatingly Joint Training Algorithm |
| Open Source Code | No | The paper provides a link for "Video demonstrations of our results" and "More visualizations are presented in the project page link", but it does not state that the source code for the methodology is released or provide a link to a code repository. |
| Open Datasets | Yes | We choose Mu Jo Co physics simulator as our test bed. ... including four tasks based on Open AI Gym (Brockman et al., 2016), i.e., Half Cheetah , Fetch Reach , Walker and Hopper , and one task based on Deep Mind Control (Tassa et al., 2018), i.e., Finger Spin . |
| Dataset Splits | No | The paper mentions "To sample the training data, we randomly collect 50k unpaired trajectories in both domain X and domain Y in most settings. The evaluation dataset size is 10k." However, it does not specify a distinct validation set used for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper mentions the use of simulation environments (MuJoCo, OpenAI Gym, DeepMind Control) and a real robot (xArm), but does not specify the hardware used to run these simulations or to train the models (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions several software components like "Mu Jo Co physics simulator", "Open AI Gym", "Deep Mind Control", "Adam optimizer", and algorithms like "DDPG" and "TD3", but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We first train the forward dynamics model F for 20 epochs using Adam (Kingma & Ba, 2014) with 0.0001 learning rate. We then train the other networks for 50 epochs with the same learning rate. We set e1 and e2 to 5000 steps in Algorithm 1. ... We set λ0 = 200, λ1 = 1, and λ2 = 0 in Eq. 5 ... We train the models by using the Adam optimizer for 50 epochs with a batch size of 32. The learning rate is set to 0.001 and decreased by 1/3 for every 10 epochs. ... For DDPG, we train the policy for 50 epochs of 400 episodes in each epoch. The policy exploration epsilon ratio is 0.3 and the reward discount factor is 0.98. For TD3, we train the policy for 400k time steps. The initial exploration step is 25k. The reward discount factor is 0.99, the target network update rate is 0.005 and exploration noise standard deviation level is 0.1. |