Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generating High-Quality Explanations for Navigation in Partially-Revealed Environments
Authors: Gregory Stein
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulated experiments validate that our explanations are both high-quality and can be used in interventions to directly correct bad behavior; agents trained via our training-by-explaining procedure achieve 9.1% lower average cost than a non-learned baseline (12.7% after interventions) in environments derived from real-world floor plans. |
| Researcher Affiliation | Academia | Gregory J. Stein Department of Computer Science George Mason University, Fairfax, VA EMAIL |
| Pseudocode | Yes | Algorithm 1: Generate Explanation Data: mt, qt, ac, ah, nn_inputs, θ0 Result: Subgoal property changes, σ 1 σ0 NN(nn_inputs, θ0) 2 Q0 Q({mt, qt, σ0}, {ah, ac}) 3 θ θ0, Q Q0 4 while Q > 0 do 5 σ NN(nn_inputs, θ) 6 Q Q({mt, qt, σ}, {ah, ac}) 7 α α( Q0, σ0, θ0) 9 θ θ η Lcomp σ h θσ 1α>α(M) i 10 σf NN(nn_inputs, θ) 11 return σf σ0 Algorithm 2: Train Subgoal Property Estimator Data: dataset, θ0 Result: Neural Network Weights, θf 1 θ θ0 2 foreach datum dataset do 3 mt, qt, ao, nn_inputs datum 4 ac arg mina A(mt) ao Q({mt, qt, σ}, a) 5 σ NN(nn_inputs, θ) 6 Q Q({mt, qt, σ}, {ao, ac}) Eq. (2) 7 α α( Q, σ, θ) Eq. (3) 1 logsigmoid( Q) 9 θ θ η Lcomp σ h θσ 1α>α(M) i Eq. (4) η θLsupervised η θLbounds 10 return θ |
| Open Source Code | Yes | Code available at https://github.com/RAIL-group/xai-nav-under-uncertainty-neurips2021 |
| Open Datasets | Yes | We conduct experiments in two different simulated environments: (1) the Guided Maze, procedurally generated mazes in which a green path on the ground indicates the (only) route to the goal and (2) the University Buildings environment, topologically complex maps which are extruded from over one-hundred floor plans of university buildings augmented to include obstacles to simulate clutter or furniture and in which long passages (that a human might identify as hallways) connect faraway regions of space. Our University Buildings environment, generated with aid of data from Whiting et al. [40], is quite large compared to many existing navigation benchmarks (e.g., [27]) and its size and complexity are well-suited for studying long-horizon navigation under uncertainty. |
| Dataset Splits | No | The paper describes the data collection for training and evaluation on test environments, but it does not explicitly state a validation split for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | training for each learned planner takes roughly 12 hours on a desktop Nvidia 2060 SUPER GPU. |
| Software Dependencies | No | The paper mentions software like "Py Torch [20]" and "Unity Game Engine [34]" but does not provide specific version numbers for these or other key software components in the main text. |
| Experiment Setup | Yes | We train three models for each environment type: All Subgoal Properties, in which all subgoal properties are used during training; 4 Subgoal Properties, where the gradient is limited to select only the four most important subgoal properties; and No Subgoal Properties (No Lcomp) where only the auxiliary losses Lsupervised and Lbounds are used during training. [...] Since each datum can have over a dozen panoramic images, we use a batch size of 1 and training for each learned planner takes roughly 12 hours on a desktop Nvidia 2060 SUPER GPU. There is considerable redundancy in the data since many images appear in multiple datum and so we train for only a single epoch, yet divide the learning rate by half every time one-eighth of the data has been consumed. |