HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
Authors: Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS, 5.1 EXPERIMENTAL SETUP, 5.2 BASELINES, 5.3 EXPERIMENTAL RESULTS, Table 1: The rescued value rate (Value), averaged rescue step (Step), and averaged damaged rate (Damage) of the proposed LLM pipeline (LLM) and all baseline methods. |
| Researcher Affiliation | Collaboration | Qinhong Zhou1 , Sunli Chen2 , Yisong Wang3, Haozhe Xu3, Weihua Du2, Hongxin Zhang1, Yilun Du4, Joshua B. Tenenbaum4, Chuang Gan1,5 1University of Massachusetts Amherst, 2 Institute for Interdisciplinary Information Sciences, Tsinghua University, 3Peking University, 4MIT, 5MIT-IBM Watson AI Lab |
| Pseudocode | No | The paper describes algorithms like A* and MCTS within the text, but it does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | For readers interested in reproducing the experimental results presented in this paper, we have made our experiments accessible via a Github repository, available at https://github.com/UMass-Foundation-Model/HAZARD. |
| Open Datasets | Yes | HAZARD is available at https://vis-www.cs.umass.edu/hazard/ and To create the dataset for HAZARD, we choose 4 distinct indoor rooms for the fire and flood tasks, and 4 outdoor regions for the wind task. |
| Dataset Splits | No | The paper states a 'train-set split ratio of 3:1' but does not explicitly mention a separate validation split or its details. |
| Hardware Specification | Yes | We run most of our experiments on an Intel i9-9900k CPU and RTX2080-Super GPU Desktop. |
| Software Dependencies | No | The paper mentions 'Open MMLab detection framework' and 'Mask-RCNN' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We use max tokens of 512, temperature of 0.7, top p of 1.0 as hyper-parameters during inference. and We use the PPO algorithm with learning rate 2.5 ˆ 10 4 and train for 105 steps. |