MD tree: a model-diagnostic tree grown on loss landscape
Authors: Yefan Zhou, Jianlong Chen, Qinxue Cao, Konstantin Schürholt, Yaoqing Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Therefore, we propose a diagnosis method called MD tree based on loss landscape metrics and experimentally demonstrate its advantage over classical validation-based approaches. We verify the effectiveness of MD tree in multiple practical scenarios: (1) use several models trained on one dataset to diagnose a model trained on another dataset, essentially a few-shot dataset transfer problem; (2) use small models (or models trained with small data) to diagnose big models (or models trained with big data), essentially a scale transfer problem. In a dataset transfer task, MD tree achieves an accuracy of 87.7%, outperforming validation-based approaches by 14.88%. Our empirical analysis uses datasets of pre-trained models with 1690 different configurations, including different model sizes, data amounts, and optimization hyperparameters. Our analysis shows that MD tree, which uses loss landscape metrics, can effectively diagnose the sources of model failures and significantly outperform the validation-based method. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Dartmouth College, NH, USA 2Zhejiang University, Zhejiang, China 3University of Illinois Urbana-Champaign, IL, USA 4University of St. Gallen, SUI. |
| Pseudocode | No | The paper describes the tree structure and decision process using text and diagrams (e.g., Figure 2, Figure 4(a)), but it does not provide a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Our code is available at https://github.com/Yefan Zhou/Model Diagnosis. |
| Open Datasets | Yes | We release these collections for future research on model diagnosis2. The collection of models, denoted as F, includes various Res Net models trained on CIFAR-10 with differing number of parameters (p), data amounts (n), and optimizer hyperparameters (t, batch size). |
| Dataset Splits | No | The paper mentions 'training and validation errors' for the pre-trained models being diagnosed. However, it does not specify the train/validation/test dataset splits used for its own experiments (e.g., how the 'F' collection of pre-trained models was split for training, validation, and testing the MD tree classifier itself). It states 'The training set consists of models randomly sampled from F for a fixed parameter count and data amount (p, n)' and 'For both cases, the test set is F'', but no explicit validation split for the MD tree training is provided. |
| Hardware Specification | Yes | The testing platform used was a Quadro RTX 6000 GPU paired with an Intel Xeon Gold 6248 CPU. |
| Software Dependencies | No | The paper mentions software like PyTorch and TensorFlow in general contexts but does not provide specific version numbers for any software dependencies or libraries used in the experimental setup. |
| Experiment Setup | Yes | Our empirical analysis uses datasets of pre-trained models with 1690 different configurations, including different model sizes, data amounts, and optimization hyperparameters. The collection of models, denoted as F, includes various Res Net models trained on CIFAR-10 with differing number of parameters (p), data amounts (n), and optimizer hyperparameters (t, batch size). MD tree only optimizes the thresholds of the metrics at each internal node sequentially from top to bottom. Initial values and search ranges are provided for each threshold, and the bounded Brent method (Brent, 1973) is used to optimize these thresholds to maximize training accuracy. The hyperparameters are provided in Appendix D. We set the maximum depth of the tree to be 4 and the minimum number of samples required to split an internal node to be 2. |