reproducibilityindex.ai

Decoupling "when to update" from "how to update"

Authors: Eran Malach, Shai Shalev-Shwartz

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now demonstrate the merit of our suggested meta-algorithm using empirical evaluation. Our main experiment is using our algorithm with deep networks in a real-world scenario of noisy labels.
Researcher Affiliation	Academia	Eran Malach School of Computer Science The Hebrew University, Israel eran.malach@mail.huji.ac.il Shai Shalev-Shwartz School of Computer Science The Hebrew University, Israel shais@cs.huji.ac.il
Pseudocode	Yes	A pseudo-code is given in Algorithm 1. Algorithm 1 Update by Disagreement an update rule U batch size b two initial predictors h1, h2 2 H for t = 1, 2, . . . , N do draw mini-batch (x1, y1), . . . , (xb, yb) Db let S = {(xi, yi) : h1(xi) 6= h2(xi)} h1 U(h1, S) h2 U(h2, S) end for
Open Source Code	Yes	2Code is available online on https://github.com/emalach/Update By Disagreement.
Open Datasets	Yes	Instead, we relied on the Labeled Faces in the Wild (LFW) dataset, which contains images of different people along with their names, but with no information about their gender. To ﬁnd the gender for each image, we use an online service to match a gender to a given name (as is suggested by [25]), a method which is naturally prone to noisy labels (due to unisex names). Applying our algorithm to an existing neural network architecture reduces the effect of the noisy labels, achieving better results than similar available approaches, when tested on a clean subset of the data. [18] Gary B Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst, 2007.
Dataset Splits	No	The paper describes how training and test sets were constructed and used (N1, N2, N3 as test sets, and combinations of N2, N3, N4, N5 for training), but does not explicitly mention or detail a validation set or its specific split for hyperparameter tuning.
Hardware Specification	No	The paper discusses training deep networks and using SGD, but does not provide specific details on the hardware used, such as GPU/CPU models or cloud instance types.
Software Dependencies	No	The paper mentions using a 'tensorﬂow implementation' but does not specify its version or the versions of any other software dependencies, such as Python or specific libraries.
Experiment Setup	Yes	Training is done for 30,000 iterations on 128 examples mini-batch. In order to make the networks disagreement meaningful, we initialize the two networks by training both of them normally (updating on all the examples) until iteration #5000, where we switch to training with the Update by Disagreement rule. Due to the fact that we are not updating on all examples, we decrease the weight of batches with less than 10% of the original examples in the original batch to stabilize gradients.