Candidates vs. Noises Estimation for Large Multi-Class Classification Problem

Authors: Lei Han, Yiheng Huang, Tong Zhang

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that CANE achieves better prediction accuracy over the Noise-Contrastive Estimation (NCE), its variants and a number of the state-of-the-art tree classifiers, while it gains significant speedup compared to standard O(K) methods. We evaluate the CANE method in various applications in this section, including both multi-class classification problems and neural language modeling.
Researcher Affiliation Industry 1Tencent AI Lab, Shenzhen, China. Correspondence to: Lei Han <lxhan@tencent.com>, Yiheng Huang <arnoldhuang@tencent.com>, Tong Zhang <tongzhang@tongzhangml.org>.
Pseudocode Yes Algorithm 1 A general optimization procedure for CANE. Algorithm 2 The Beam Tree Algorithm.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of their own code. It only references an external platform (Vowpal-Wabbit) used for comparison, with a link to that platform's repository.
Open Datasets Yes We consider four multi-class classification problems, including the Sector3 dataset with 105 classes (Chang & Lin, 2011), the ALOI4 dataset with 1000 classes (Geusebroek et al., 2005), the Image Net-20105 dataset with 1000 classes, and the Image Net-10K5 dataset with 10K classes (Image Net Fall 2009 release). We test the methods on two benchmark corpora: the Penn Tree Bank (PTB) (Mikolov et al., 2010) and Gutenberg8 corpora.
Dataset Splits Yes The data from Sector and ALOI is split into 90% training and 10% testing. In Image Net-2010, the training set contains 1.3M images and we use the validation set containing 50K images as the test set. The Image Net-10K data contains 9M images and we randomly split the data into two halves for training and testing by following the protocols in (Deng et al., 2010; S anchez & Perronnin, 2011; Le, 2013).
Hardware Specification Yes All the methods are implemented using a standard CPU machine with quad-core Intel Core i5 processor. The experiments in this section are implemented on a machine with NVIDIA Tesla M40 GPUs.
Software Dependencies No The paper mentions software like 'Vowpal-Wabbit' and 'VGG-16 net' but does not specify any version numbers for these or any other software dependencies.
Experiment Setup Yes We use b-nary tree for CANE and set b = 10 for all classification problems. We set k = 10 for Sector and ALOI and k = 20 for Image Net-2010 and Image Net-10K. All the methods use SGD with learning rate selected from {0.0001, 0.001, 0.01, 0.05, 0.1, 0.5, 1.0}. We run all the methods 50 epochs on Sector, ALOI and Image Net-2010 datasets and 20 epochs on Image Net-10K. We set the embedding size as 256 and use a LSTM model with 512 hidden states and 256 projection size. The sequence length is fixed as 20 and the learning rate is selected from {0.025, 0.05, 0.1, 0.2}.