reproducibilityindex.ai

Run-Time Task Composition with Safety Semantics

Authors: Kevin Leahy, Makai Mann, Zachary Serlin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate these techniques using modified versions of value iteration in a grid world, Deep QNetwork (DQN) in a grid world with image observations, and Twin Delayed DDPG (TD3) in a continuous-observation and continuous-action Bullet physics simulation environment.
Researcher Affiliation	Collaboration	1Department of Robotics Engineering, Worcester Polytechnic Institute, Lexington, MA, USA 2MIT Lincoln Laboratory, Lexington, MA, USA. Correspondence to: Kevin Leahy <kleahy@wpi.edu>.
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper. Methods are described through textual explanations and mathematical equations.
Open Source Code	No	The paper does not provide any concrete access information (specific link, explicit statement of release, or mention in supplementary materials) for the source code of its described methodology. It mentions using 'a modified version of the code from Nangue Tasse et al. (2020)' and 'a modified version of the TD3 code from Fujimoto et al. (2018)', referring to external code used, not their own.
Open Datasets	Yes	Bullet-Safety-Gym (Gronauer, 2022), a 3D physics simulation with 96D LIDAR-like observations and a (continuous) 2D force vector action space; optimal policies approximated by TD3
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing. It discusses training steps and curriculum learning, but not explicit data partitioning details.
Hardware Specification	Yes	All function approximation experiments were conducted with an NVIDIA Volta GPU, and tuned over three learning rates using curriculum learning, where penalties were added after the policy could successfully reach goals.
Software Dependencies	No	The paper mentions software components like 'Deep Q-Network (DQN)', 'Twin Delayed DDPG (TD3)', and 'Bullet-Safety-Gym (Gronauer, 2022)', but does not provide specific version numbers for these or any underlying libraries (e.g., PyTorch, TensorFlow, etc.) that would be necessary for reproducibility.
Experiment Setup	Yes	All function approximation experiments were conducted with an NVIDIA Volta GPU, and tuned over three learning rates using curriculum learning, where penalties were added after the policy could successfully reach goals.