reproducibilityindex.ai

MD tree: a model-diagnostic tree grown on loss landscape

Authors: Yefan Zhou, Jianlong Chen, Qinxue Cao, Konstantin Schürholt, Yaoqing Yang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Therefore, we propose a diagnosis method called MD tree based on loss landscape metrics and experimentally demonstrate its advantage over classical validation-based approaches. We verify the effectiveness of MD tree in multiple practical scenarios: (1) use several models trained on one dataset to diagnose a model trained on another dataset, essentially a few-shot dataset transfer problem; (2) use small models (or models trained with small data) to diagnose big models (or models trained with big data), essentially a scale transfer problem. In a dataset transfer task, MD tree achieves an accuracy of 87.7%, outperforming validation-based approaches by 14.88%. Our empirical analysis uses datasets of pre-trained models with 1690 different configurations, including different model sizes, data amounts, and optimization hyperparameters. Our analysis shows that MD tree, which uses loss landscape metrics, can effectively diagnose the sources of model failures and significantly outperform the validation-based method.
Researcher Affiliation	Academia	1Department of Computer Science, Dartmouth College, NH, USA 2Zhejiang University, Zhejiang, China 3University of Illinois Urbana-Champaign, IL, USA 4University of St. Gallen, SUI.
Pseudocode	No	The paper describes the tree structure and decision process using text and diagrams (e.g., Figure 2, Figure 4(a)), but it does not provide a formal pseudocode block or algorithm listing.
Open Source Code	Yes	Our code is available at https://github.com/Yefan Zhou/Model Diagnosis.
Open Datasets	Yes	We release these collections for future research on model diagnosis2. The collection of models, denoted as F, includes various Res Net models trained on CIFAR-10 with differing number of parameters (p), data amounts (n), and optimizer hyperparameters (t, batch size).
Dataset Splits	No	The paper mentions 'training and validation errors' for the pre-trained models being diagnosed. However, it does not specify the train/validation/test dataset splits used for its own experiments (e.g., how the 'F' collection of pre-trained models was split for training, validation, and testing the MD tree classifier itself). It states 'The training set consists of models randomly sampled from F for a fixed parameter count and data amount (p, n)' and 'For both cases, the test set is F'', but no explicit validation split for the MD tree training is provided.
Hardware Specification	Yes	The testing platform used was a Quadro RTX 6000 GPU paired with an Intel Xeon Gold 6248 CPU.
Software Dependencies	No	The paper mentions software like PyTorch and TensorFlow in general contexts but does not provide specific version numbers for any software dependencies or libraries used in the experimental setup.
Experiment Setup	Yes	Our empirical analysis uses datasets of pre-trained models with 1690 different configurations, including different model sizes, data amounts, and optimization hyperparameters. The collection of models, denoted as F, includes various Res Net models trained on CIFAR-10 with differing number of parameters (p), data amounts (n), and optimizer hyperparameters (t, batch size). MD tree only optimizes the thresholds of the metrics at each internal node sequentially from top to bottom. Initial values and search ranges are provided for each threshold, and the bounded Brent method (Brent, 1973) is used to optimize these thresholds to maximize training accuracy. The hyperparameters are provided in Appendix D. We set the maximum depth of the tree to be 4 and the minimum number of samples required to split an internal node to be 2.