Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
Authors: Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Hausman, Alexander Herzog
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our methods on simulated benchmark tasks and a large-scale robotic grasping task where the robot must think while moving . |
| Researcher Affiliation | Collaboration | Ted Xiao1, Eric Jang1, Dmitry Kalashnikov1, Sergey Levine1,2, Julian Ibarz1, Karol Hausman1 , Alexander Herzog3 1Google Brain, 2UC Berkeley, 3X |
| Pseudocode | Yes | Algorithm 1 shows the modified QT-Opt procedure. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology described in this paper, nor does it provide a direct link to such code. It mentions using TF-Agents and QT-Opt, which are existing libraries/methods. |
| Open Datasets | Yes | We use 3D Mu Jo Co based implementations in Deep Mind Control Suite (Tassa et al., 2018) for both tasks. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits, such as percentages or sample counts. It mentions hyperparameter sweeps but not how the data was partitioned for them. |
| Hardware Specification | No | The paper states that “episode generation, Bellman updates and Q-fitting are distributed across many machines”, but it does not provide specific details about the hardware used (e.g., GPU models, CPU types, or memory). |
| Software Dependencies | Yes | For the baseline learning algorithm implementations, we use the TF-Agents (Guadarrama et al., 2018) implementations of a Deep Q-Network agent, which utilizes a Feed-forward Neural Network (FNN), and a Deep Q-Recurrent Neutral Network agent, which utilizes a Long Short-Term Memory (LSTM) network. |
| Experiment Setup | Yes | The number of action execution steps is selected from {0ms, 5ms, 25ms, or 50ms} once at environment initialization. t AS is selected from {0ms, 5ms, 10ms, 25ms, or 50ms} either once at environment initialization or repeatedly at every episode reset. |