Who Said What: Modeling Individual Labelers Improves Classification
Authors: Melody Guan, Varun Gulshan, Andrew Dai, Geoffrey Hinton
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we show that our approach leads to improvements in computer-aided diagnosis of diabetic retinopathy. We also show that our method performs better than competing algorithms by Welinder and Perona (2010); Mnih and Hinton (2012). |
| Researcher Affiliation | Collaboration | Melody Y. Guan Stanford University 450 Serra Mall Stanford, California 94305 mguan@stanford.edu Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton Google Brain, 1600 Amphitheatre Pwky Mountain View, California 94043 {varungulshan, adai, geoffhinton}@google.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide any concrete statement about open-source code availability or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The training dataset consists of 126, 522 images sourced from patients presenting for diabetic retinopathy screening at sites managed by 4 different clinical partners: Eye PACS, Aravind Eye Care, Sankara Nethralaya, and Narayana Nethralaya. Our test dataset consists of 3,547 images from the Eye PACS-1 and Messidor-2 datasets. |
| Dataset Splits | Yes | MNIST has 60k training images and 10k test images and the task is to classify each The training dataset consists of 126, 522 images... The validation dataset consists of 7,804 images obtained from Eye PACS clinics. Our test dataset consists of 3,547 images from the Eye PACS-1 and Messidor-2 datasets. |
| Hardware Specification | No | The paper mentions training with 'one GPU per replica' and '32 replicas and 17 parameter servers' but does not specify the exact models of the GPUs, CPUs, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions using 'TensorFlow' but does not specify its version number or the versions of any other software dependencies. |
| Experiment Setup | Yes | We train the network weights using distributed stochastic gradient descent (Abadi et al. 2016) with the Adam optimizer on mini-batches of size 8. Table 5 displays the optimal hyperparameters used in DR classification. We tuned using grid search on the following hyperparameter spaces: dropout for Inception backbone {0.5, 0.55, 0.6, . . ., 1.0}, dropout for doctor models {0.5, 0.55, 0.6, . . ., 1.0}, learning rate {1 10 7, 3 10 7, 1 10 6, . . ., 0.03}, entropy weight {0.0, 0.0025, 0.005, . . ., 0.03} {0.1}, weight decay for Inception {0.000004, 0.00001, 0.00004, . . ., 0.1}, L1 weight decay for doctor models {0.000004, 0.00001, 0.00004, . . ., 0.04}, L2 weight decay for doctor models {0.00001, 0.00004, . . ., 0.04}, L1 weight decay for averaging logits {0.001, 0.01, 0.02, 0.03, . . ., 0.1, 0.2, 0.3, . . ., 1, 2, 3, . . ., 10, 100, 1000}, L2 weight decay for averaging logits {0.001, 0.01, 0.1, 0.2, 0.3, . . .,1, 5, 10, 15, 20, 30, . . ., 150, 200, 300, 400, 500, 1000}, and bottleneck size (for BIWDN) {2, 3, 4, 5, 6, 7}. |