reproducibilityindex.ai

Thinking While Moving: Deep Reinforcement Learning with Concurrent Control

Authors: Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Hausman, Alexander Herzog

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our methods on simulated benchmark tasks and a large-scale robotic grasping task where the robot must think while moving .
Researcher Affiliation	Collaboration	Ted Xiao1, Eric Jang1, Dmitry Kalashnikov1, Sergey Levine1,2, Julian Ibarz1, Karol Hausman1 , Alexander Herzog3 1Google Brain, 2UC Berkeley, 3X
Pseudocode	Yes	Algorithm 1 shows the modiﬁed QT-Opt procedure.
Open Source Code	No	The paper does not contain an explicit statement about releasing the source code for the methodology described in this paper, nor does it provide a direct link to such code. It mentions using TF-Agents and QT-Opt, which are existing libraries/methods.
Open Datasets	Yes	We use 3D Mu Jo Co based implementations in Deep Mind Control Suite (Tassa et al., 2018) for both tasks.
Dataset Splits	No	The paper does not provide specific details on training, validation, and test dataset splits, such as percentages or sample counts. It mentions hyperparameter sweeps but not how the data was partitioned for them.
Hardware Specification	No	The paper states that “episode generation, Bellman updates and Q-ﬁtting are distributed across many machines”, but it does not provide specific details about the hardware used (e.g., GPU models, CPU types, or memory).
Software Dependencies	Yes	For the baseline learning algorithm implementations, we use the TF-Agents (Guadarrama et al., 2018) implementations of a Deep Q-Network agent, which utilizes a Feed-forward Neural Network (FNN), and a Deep Q-Recurrent Neutral Network agent, which utilizes a Long Short-Term Memory (LSTM) network.
Experiment Setup	Yes	The number of action execution steps is selected from {0ms, 5ms, 25ms, or 50ms} once at environment initialization. t AS is selected from {0ms, 5ms, 10ms, 25ms, or 50ms} either once at environment initialization or repeatedly at every episode reset.