reproducibilityindex.ai

Provable Multi-instance Deep AUC Maximization with Stochastic Pooling

Authors: Dixian Zhu, Bokun Wang, Zhi Chen, Yaxing Wang, Milan Sonka, Xiaodong Wu, Tianbao Yang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments on conventional MIL datasets and medical datasets demonstrate the superiority of our MIDAM algorithm. In this section, we present some experimental results. We choose datasets from three categories, namely traditional tabular datasets, histopathological image datasets, and MRI/CT datasets.
Researcher Affiliation	Academia	1Department of Computer Science, University of Iowa, IA, USA 2Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA 3Department of Electrical and Computer Engineering, University of Iowa, IA, USA 4Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, Beijing, China.
Pseudocode	Yes	Algorithm 1 The Unified MIDAM Algorithm
Open Source Code	Yes	The method is opensourced at https://libauc.org/. The code is available at https://github.com/Dixian Zhu/MIDAM
Open Datasets	Yes	We choose datasets from three categories, namely traditional tabular datasets, histopathological image datasets, and MRI/CT datasets. Statistics of these datasets are described in Tabel 1. Details of these datasets will be presented later. Five benchmark datasets, namely, MUSK1, MUSK2, Fox, Tiger, Elephant (Dietterich et al., 1997a; Andrews et al., 2002), are commonly used for evaluating MIL methods. For the MUSK1 and MUSK2 datasets, they contain drug molecules that will (or not) bind strongly to a target protein. ... We choose two histopathological image datasets, namely Breast Cancer and Colon Adenocarcinoma (Gelasca et al., 2008; Borkowski et al., 2019a). ... The first data set is from the University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM) (Calabrese et al., 2022), short as PDGM in this work. The second dataset contains multiple OCT images for a large number of patients (Xie et al., 2022).
Dataset Splits	Yes	We uniformly randomly split the data with 0.9/0.1 train/test ratio and run 5-fold-cross-validation experiments with 3 different random seeds (totally 15 different trials). The initial learning rate is tuned in {1e-1,1e-2,1e-3}, and is decreased by 10 fold at the end of the 50-th epoch and 75-th epoch over the 100-epoch-training period. ... We report the testing AUC based on a model with the largest validation AUC value. ... For the Breast Cancer dataset, we generate data train/test (0.9/0.1) splitting 2 times with different random seeds and conduct five-fold cross validation (10 trials). For the other datasets, we do single random train/test (0.9/0.1) splitting and conduct five-fold cross validation (5 trials).
Hardware Specification	No	The paper mentions general hardware like 'GPU memory' and 'GPU constraint' but does not specify exact GPU models, CPU models, or other detailed hardware specifications used for experiments.
Software Dependencies	No	The paper mentions optimizers and backbone models (e.g., 'Adam optimizer', 'Res Net20') but does not specify any software names with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup	Yes	The initial learning rate is tuned in {1e-1,1e-2,1e-3}, and is decreased by 10 fold at the end of the 50-th epoch and 75-th epoch over the 100-epoch-training period. For all experiments in this work, the weight decay is fixed as 1e 4, and we fix η = 1, (1 β1) = 0.9 in our proposed algorithm decreasing by 2 fold at the same time with learning rate. We report the testing AUC based on a model with the largest validation AUC value. For each iteration, we sample 8 positive bags and 8 negative bags (S+ = S = 8), and for each bag sample at most 4 instances for our methods but use all instances for baselines, given that the dataset is small and bag size is not identical across all bags. The mean and standard deviation of testing AUC are presented in Table 2. ... We fix the margin parameter as 0.1 for DAM and MIDAM. For attention-based pooling, we use the one defined in (2) with an attentional factor exp(w a tanh(V e(we; x))) according to (Ilse et al., 2018). ... For AUC loss function, we apply sigmoid as normalization for the output. The weight decay is fixed as 1e-4. For all the experiments, we run 100 epochs for each trial and decrease learning rate by 10 fold at the end of the 50-th epoch and 75-th epoch. For the Breast Cancer dataset, we generate data train/test (0.9/0.1) splitting 2 times with different random seeds and conduct five-fold cross validation (10 trials). For the other datasets, we do single random train/test (0.9/0.1) splitting and conduct five-fold cross validation (5 trials). The margin parameter is tuned in {0.1,0.5,1.0} for AUC loss function. The initial learning rate is tuned in {1e-1,1e-2,1e-3} for histopathological image datasets, and is fixed as 1e-2 for PDGM, 1e-1 for OCT.