Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Authors: Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric Wang
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on MP3D (Chang et al., 2017), HM3D (Ramakrishnan et al., 2021), and Robo THOR (Deitke et al., 2020) benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than Co W (Gadre et al., 2022) on MP3D). |
| Researcher Affiliation | Collaboration | 1University of California, Santa Cruz 2Samsung Research America. |
| Pseudocode | Yes | Algorithm 1 Navigation algorithm |
| Open Source Code | No | The paper does not include an unambiguous statement indicating the release of the source code for the described methodology, nor does it provide a direct link to a code repository. The text does not contain phrases like 'We release our code', 'The source code for our method is available at', or similar declarations of public code availability. |
| Open Datasets | Yes | MP3D (Chang et al., 2017) is used in Habitat Object Nav challenges, containing 2195 validation episodes on 11 validation environments with 21 goal object categories. HM3D (Ramakrishnan et al., 2021) is used in Habitat 2022 Object Nav challenge, containing 2000 validation episodes on 20 validation environments with 6 goal object categories. Robo THOR (Deitke et al., 2020) is used in Robo THOR 2020, 2021 Object Nav challenge, containing 1800 validation episodes on 15 validation environments with 12 goal object categories. |
| Dataset Splits | Yes | MP3D (Chang et al., 2017) is used in Habitat Object Nav challenges, containing 2195 validation episodes on 11 validation environments with 21 goal object categories. HM3D (Ramakrishnan et al., 2021) is used in Habitat 2022 Object Nav challenge, containing 2000 validation episodes on 20 validation environments with 6 goal object categories. Robo THOR (Deitke et al., 2020) is used in Robo THOR 2020, 2021 Object Nav challenge, containing 1800 validation episodes on 15 validation environments with 12 goal object categories. |
| Hardware Specification | No | The paper provides details about the simulated agent's configuration and sensor setup (e.g., 'The agent has a height of 0.88m, with a radius of 0.18m. The agent receives 640 x 480 RGB-D egocentric views from a camera with 79 HFoV placed 0.88m from the ground.'), but it does not specify the actual hardware (e.g., GPU models, CPU models, or cloud computing instances) used to run the experiments or train the models. |
| Software Dependencies | No | The paper mentions using specific pre-trained models like 'GLIP-L (Li* et al., 2022)', 'Deberta v3 (He et al., 2021)', and 'Chat GPT (Ouyang et al., 2022)', but it does not list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x) that are required to replicate the experiments. |
| Experiment Setup | Yes | There are several hyper-parameters in the ESC method. For the distance threshold df for selecting the closest frontier to explore, we use df = 1.6m in all the experiments. For the threshold do determining whether a frontier is near an object, we fix do = 1.6m. For the threshold dr determining whether a frontier is in a room, we fix dr = 0.6m. We applied a weight of 1.0 for all PSL rules when only one of commonsense reasoning (object or room) was utilized. Moreover, we double the weight for the shortest distance rule in Eq. 6 to 2.0 when both levels of commonsense reasoning are employed. |