Who Said What: Modeling Individual Labelers Improves Classification

Authors: Melody Guan, Varun Gulshan, Andrew Dai, Geoffrey Hinton

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we show that our approach leads to improvements in computer-aided diagnosis of diabetic retinopathy. We also show that our method performs better than competing algorithms by Welinder and Perona (2010); Mnih and Hinton (2012).
Researcher Affiliation Collaboration Melody Y. Guan Stanford University 450 Serra Mall Stanford, California 94305 mguan@stanford.edu Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton Google Brain, 1600 Amphitheatre Pwky Mountain View, California 94043 {varungulshan, adai, geoffhinton}@google.com
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide any concrete statement about open-source code availability or a link to a code repository for the described methodology.
Open Datasets Yes The training dataset consists of 126, 522 images sourced from patients presenting for diabetic retinopathy screening at sites managed by 4 different clinical partners: Eye PACS, Aravind Eye Care, Sankara Nethralaya, and Narayana Nethralaya. Our test dataset consists of 3,547 images from the Eye PACS-1 and Messidor-2 datasets.
Dataset Splits Yes MNIST has 60k training images and 10k test images and the task is to classify each The training dataset consists of 126, 522 images... The validation dataset consists of 7,804 images obtained from Eye PACS clinics. Our test dataset consists of 3,547 images from the Eye PACS-1 and Messidor-2 datasets.
Hardware Specification No The paper mentions training with 'one GPU per replica' and '32 replicas and 17 parameter servers' but does not specify the exact models of the GPUs, CPUs, or other hardware components used for the experiments.
Software Dependencies No The paper mentions using 'TensorFlow' but does not specify its version number or the versions of any other software dependencies.
Experiment Setup Yes We train the network weights using distributed stochastic gradient descent (Abadi et al. 2016) with the Adam optimizer on mini-batches of size 8. Table 5 displays the optimal hyperparameters used in DR classification. We tuned using grid search on the following hyperparameter spaces: dropout for Inception backbone {0.5, 0.55, 0.6, . . ., 1.0}, dropout for doctor models {0.5, 0.55, 0.6, . . ., 1.0}, learning rate {1 10 7, 3 10 7, 1 10 6, . . ., 0.03}, entropy weight {0.0, 0.0025, 0.005, . . ., 0.03} {0.1}, weight decay for Inception {0.000004, 0.00001, 0.00004, . . ., 0.1}, L1 weight decay for doctor models {0.000004, 0.00001, 0.00004, . . ., 0.04}, L2 weight decay for doctor models {0.00001, 0.00004, . . ., 0.04}, L1 weight decay for averaging logits {0.001, 0.01, 0.02, 0.03, . . ., 0.1, 0.2, 0.3, . . ., 1, 2, 3, . . ., 10, 100, 1000}, L2 weight decay for averaging logits {0.001, 0.01, 0.1, 0.2, 0.3, . . .,1, 5, 10, 15, 20, 30, . . ., 150, 200, 300, 400, 500, 1000}, and bottleneck size (for BIWDN) {2, 3, 4, 5, 6, 7}.