Decoupling "when to update" from "how to update"
Authors: Eran Malach, Shai Shalev-Shwartz
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now demonstrate the merit of our suggested meta-algorithm using empirical evaluation. Our main experiment is using our algorithm with deep networks in a real-world scenario of noisy labels. |
| Researcher Affiliation | Academia | Eran Malach School of Computer Science The Hebrew University, Israel eran.malach@mail.huji.ac.il Shai Shalev-Shwartz School of Computer Science The Hebrew University, Israel shais@cs.huji.ac.il |
| Pseudocode | Yes | A pseudo-code is given in Algorithm 1. Algorithm 1 Update by Disagreement an update rule U batch size b two initial predictors h1, h2 2 H for t = 1, 2, . . . , N do draw mini-batch (x1, y1), . . . , (xb, yb) Db let S = {(xi, yi) : h1(xi) 6= h2(xi)} h1 U(h1, S) h2 U(h2, S) end for |
| Open Source Code | Yes | 2Code is available online on https://github.com/emalach/Update By Disagreement. |
| Open Datasets | Yes | Instead, we relied on the Labeled Faces in the Wild (LFW) dataset, which contains images of different people along with their names, but with no information about their gender. To find the gender for each image, we use an online service to match a gender to a given name (as is suggested by [25]), a method which is naturally prone to noisy labels (due to unisex names). Applying our algorithm to an existing neural network architecture reduces the effect of the noisy labels, achieving better results than similar available approaches, when tested on a clean subset of the data. [18] Gary B Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst, 2007. |
| Dataset Splits | No | The paper describes how training and test sets were constructed and used (N1, N2, N3 as test sets, and combinations of N2, N3, N4, N5 for training), but does not explicitly mention or detail a validation set or its specific split for hyperparameter tuning. |
| Hardware Specification | No | The paper discusses training deep networks and using SGD, but does not provide specific details on the hardware used, such as GPU/CPU models or cloud instance types. |
| Software Dependencies | No | The paper mentions using a 'tensorflow implementation' but does not specify its version or the versions of any other software dependencies, such as Python or specific libraries. |
| Experiment Setup | Yes | Training is done for 30,000 iterations on 128 examples mini-batch. In order to make the networks disagreement meaningful, we initialize the two networks by training both of them normally (updating on all the examples) until iteration #5000, where we switch to training with the Update by Disagreement rule. Due to the fact that we are not updating on all examples, we decrease the weight of batches with less than 10% of the original examples in the original batch to stabilize gradients. |