NBDT: Neural-Backed Decision Tree

Authors: Alvin Wan, Lisa Dunlap, Daniel Ho, Jihan Yin, Scott Lee, Suzanne Petryk, Sarah Adel Bargal, Joseph E. Gonzalez

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental NBDTs obtain state-of-the-art results for interpretable models and match or outperform modern neural networks on image classification. We report results on different models (Res Net, Wide Res Net, Efficient Net) and datasets (CIFAR10, CIFAR100, Tiny Image Net, Image Net). We additionally conduct ablation studies to verify the hierarchy and loss designs, find that our training procedure improves the original neural network s accuracy by up to 2%, and show that NBDTs improve generalization to unseen classes by up to 16%.
Researcher Affiliation Academia Alvin Wan1, Lisa Dunlap 1 , Daniel Ho 1, Jihan Yin1, Scott Lee1, Suzanne Petryk1, Sarah Adel Bargal2, Joseph E. Gonzalez1 UC Berkeley1, Boston University2 {alvinwan,ldunlap,danielho,jihan yin,scott.lee.3898,spetryk,jegonzal}@berkeley.edu sbargal@bu.edu
Pseudocode No The paper describes procedures for building induced hierarchies and inference (e.g., Figure 2), but it does not contain a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code and pretrained NBDTs are at github.com/alvinwan/neural-backed-decision-trees.
Open Datasets Yes We report results on different models (Res Net, Wide Res Net, Efficient Net) and datasets (CIFAR10, CIFAR100, Tiny Image Net, Image Net).
Dataset Splits Yes We report results on different models (Res Net, Wide Res Net, Efficient Net) and datasets (CIFAR10, CIFAR100, Tiny Image Net, Image Net).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Our tree supervision loss Lsoft requires a pre-defined hierarchy. We find that (a) tree supervision loss damages learning speed early in training, when leaf weights are nonsensical. Thus, our tree supervision weight !t grows linearly from !0 = 0 to !T = 0.5 for CIFAR10, CIFAR100, and to !T = 5 for Tiny Image Net, Image Net; βt 2 [0, 1] decays linearly over time. (b) We re-train where possible, fine-tuning with Lsoft only when the original model accuracy is not reproducible.