Generating High-Quality Explanations for Navigation in Partially-Revealed Environments

Authors: Gregory Stein

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulated experiments validate that our explanations are both high-quality and can be used in interventions to directly correct bad behavior; agents trained via our training-by-explaining procedure achieve 9.1% lower average cost than a non-learned baseline (12.7% after interventions) in environments derived from real-world floor plans.
Researcher Affiliation Academia Gregory J. Stein Department of Computer Science George Mason University, Fairfax, VA gjstein@gmu.edu
Pseudocode Yes Algorithm 1: Generate Explanation Data: mt, qt, ac, ah, nn_inputs, θ0 Result: Subgoal property changes, σ 1 σ0 NN(nn_inputs, θ0) 2 Q0 Q({mt, qt, σ0}, {ah, ac}) 3 θ θ0, Q Q0 4 while Q > 0 do 5 σ NN(nn_inputs, θ) 6 Q Q({mt, qt, σ}, {ah, ac}) 7 α α( Q0, σ0, θ0) 9 θ θ η Lcomp σ h θσ 1α>α(M) i 10 σf NN(nn_inputs, θ) 11 return σf σ0 Algorithm 2: Train Subgoal Property Estimator Data: dataset, θ0 Result: Neural Network Weights, θf 1 θ θ0 2 foreach datum dataset do 3 mt, qt, ao, nn_inputs datum 4 ac arg mina A(mt) ao Q({mt, qt, σ}, a) 5 σ NN(nn_inputs, θ) 6 Q Q({mt, qt, σ}, {ao, ac}) Eq. (2) 7 α α( Q, σ, θ) Eq. (3) 1 logsigmoid( Q) 9 θ θ η Lcomp σ h θσ 1α>α(M) i Eq. (4) η θLsupervised η θLbounds 10 return θ
Open Source Code Yes Code available at https://github.com/RAIL-group/xai-nav-under-uncertainty-neurips2021
Open Datasets Yes We conduct experiments in two different simulated environments: (1) the Guided Maze, procedurally generated mazes in which a green path on the ground indicates the (only) route to the goal and (2) the University Buildings environment, topologically complex maps which are extruded from over one-hundred floor plans of university buildings augmented to include obstacles to simulate clutter or furniture and in which long passages (that a human might identify as hallways) connect faraway regions of space. Our University Buildings environment, generated with aid of data from Whiting et al. [40], is quite large compared to many existing navigation benchmarks (e.g., [27]) and its size and complexity are well-suited for studying long-horizon navigation under uncertainty.
Dataset Splits No The paper describes the data collection for training and evaluation on test environments, but it does not explicitly state a validation split for hyperparameter tuning or model selection.
Hardware Specification Yes training for each learned planner takes roughly 12 hours on a desktop Nvidia 2060 SUPER GPU.
Software Dependencies No The paper mentions software like "Py Torch [20]" and "Unity Game Engine [34]" but does not provide specific version numbers for these or other key software components in the main text.
Experiment Setup Yes We train three models for each environment type: All Subgoal Properties, in which all subgoal properties are used during training; 4 Subgoal Properties, where the gradient is limited to select only the four most important subgoal properties; and No Subgoal Properties (No Lcomp) where only the auxiliary losses Lsupervised and Lbounds are used during training. [...] Since each datum can have over a dozen panoramic images, we use a batch size of 1 and training for each learned planner takes roughly 12 hours on a desktop Nvidia 2060 SUPER GPU. There is considerable redundancy in the data since many images appear in multiple datum and so we train for only a single epoch, yet divide the learning rate by half every time one-eighth of the data has been consumed.