Run-Time Task Composition with Safety Semantics

Authors: Kevin Leahy, Makai Mann, Zachary Serlin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate these techniques using modified versions of value iteration in a grid world, Deep QNetwork (DQN) in a grid world with image observations, and Twin Delayed DDPG (TD3) in a continuous-observation and continuous-action Bullet physics simulation environment.
Researcher Affiliation Collaboration 1Department of Robotics Engineering, Worcester Polytechnic Institute, Lexington, MA, USA 2MIT Lincoln Laboratory, Lexington, MA, USA. Correspondence to: Kevin Leahy <kleahy@wpi.edu>.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper. Methods are described through textual explanations and mathematical equations.
Open Source Code No The paper does not provide any concrete access information (specific link, explicit statement of release, or mention in supplementary materials) for the source code of its described methodology. It mentions using 'a modified version of the code from Nangue Tasse et al. (2020)' and 'a modified version of the TD3 code from Fujimoto et al. (2018)', referring to external code used, not their own.
Open Datasets Yes Bullet-Safety-Gym (Gronauer, 2022), a 3D physics simulation with 96D LIDAR-like observations and a (continuous) 2D force vector action space; optimal policies approximated by TD3
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing. It discusses training steps and curriculum learning, but not explicit data partitioning details.
Hardware Specification Yes All function approximation experiments were conducted with an NVIDIA Volta GPU, and tuned over three learning rates using curriculum learning, where penalties were added after the policy could successfully reach goals.
Software Dependencies No The paper mentions software components like 'Deep Q-Network (DQN)', 'Twin Delayed DDPG (TD3)', and 'Bullet-Safety-Gym (Gronauer, 2022)', but does not provide specific version numbers for these or any underlying libraries (e.g., PyTorch, TensorFlow, etc.) that would be necessary for reproducibility.
Experiment Setup Yes All function approximation experiments were conducted with an NVIDIA Volta GPU, and tuned over three learning rates using curriculum learning, where penalties were added after the policy could successfully reach goals.