reproducibilityindex.ai

Loss Decomposition for Fast Learning in Large Output Spaces

Authors: Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments on multiclass and multilabel classiﬁcation with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.
Researcher Affiliation	Collaboration	1Carnegie Mellon University, Pittsburgh, USA 2Google, New York, USA. Correspondence to: Ian E.H. Yen <eyan@cs.cmu.edu>, Satyen Kale <satyenkale@google.com>.
Pseudocode	Yes	Algorithm 1 Loss and Gradient Approximation via Search
Open Source Code	No	The paper does not provide a specific link or explicit statement about releasing the source code for their proposed methodology.
Open Datasets	Yes	multiclass classiﬁcation we conduct experiments on the largest publicly available facial recognition dataset Mega Face (Challenge 2)2, where each identity is considered a class, and each sample is an image cropped by a face detector. The data set statistics are shown in Table 1.
Dataset Splits	No	The paper uses "Test Accuracy" and "Train Accuracy" in its figures, implying the use of splits, but it does not explicitly provide the specific percentages, sample counts, or methodology for splitting the datasets into training, validation, or test sets.
Hardware Specification	No	The paper states that methods are 'parallelized with 10 CPU cores in a shared-memory architecture, running on a dedicated machine' and 'parallelized with 24 CPU cores', but it does not provide specific details such as CPU model, GPU model, or memory specifications.
Software Dependencies	No	The paper states 'All the implementation are in C++' and refers to several external methods and packages like 'Spherical Clustering' and 'word2vec package', but it does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup	Yes	For multiclass and multilabel classiﬁcation, we employ a Stochastic Gradient Descent (SGD) optimization algorithm, with an initial step size chosen from {1, 0.1, 0.01} for the best performance of each method, with a 1/(1 + t) cooling scheme where t is the iteration counter. The minibatch size is 10