HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

Authors: Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS, 5.1 EXPERIMENTAL SETUP, 5.2 BASELINES, 5.3 EXPERIMENTAL RESULTS, Table 1: The rescued value rate (Value), averaged rescue step (Step), and averaged damaged rate (Damage) of the proposed LLM pipeline (LLM) and all baseline methods.
Researcher Affiliation Collaboration Qinhong Zhou1 , Sunli Chen2 , Yisong Wang3, Haozhe Xu3, Weihua Du2, Hongxin Zhang1, Yilun Du4, Joshua B. Tenenbaum4, Chuang Gan1,5 1University of Massachusetts Amherst, 2 Institute for Interdisciplinary Information Sciences, Tsinghua University, 3Peking University, 4MIT, 5MIT-IBM Watson AI Lab
Pseudocode No The paper describes algorithms like A* and MCTS within the text, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes For readers interested in reproducing the experimental results presented in this paper, we have made our experiments accessible via a Github repository, available at https://github.com/UMass-Foundation-Model/HAZARD.
Open Datasets Yes HAZARD is available at https://vis-www.cs.umass.edu/hazard/ and To create the dataset for HAZARD, we choose 4 distinct indoor rooms for the fire and flood tasks, and 4 outdoor regions for the wind task.
Dataset Splits No The paper states a 'train-set split ratio of 3:1' but does not explicitly mention a separate validation split or its details.
Hardware Specification Yes We run most of our experiments on an Intel i9-9900k CPU and RTX2080-Super GPU Desktop.
Software Dependencies No The paper mentions 'Open MMLab detection framework' and 'Mask-RCNN' but does not provide specific version numbers for these software components.
Experiment Setup Yes We use max tokens of 512, temperature of 0.7, top p of 1.0 as hyper-parameters during inference. and We use the PPO algorithm with learning rate 2.5 ˆ 10 4 and train for 105 steps.