reproducibilityindex.ai

One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities

Authors: Michalis Titsias RC AUEB

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that the new bound has interesting theoretical properties and we demonstrate its use in classiﬁcation problems. Figure 1 shows some estimated softmax probabilities, using a dataset of 200 points each taking one out of ten values... Here, we consider AMAZONCAT-13K... which is a large scale classiﬁcation dataset.
Researcher Affiliation	Academia	Michalis K. Titsias Department of Informatics Athens University of Economics and Business mtitsias@aueb.gr
Pseudocode	No	The paper provides mathematical derivations and explanations but does not include pseudocode or an algorithm block.
Open Source Code	No	The paper does not mention providing access to source code for the described methodology.
Open Datasets	Yes	MNIST2, 20NEWS3 and BIBTEX [12]; see Table 1 for details. (Footnotes 2, 3, 4 provide URLs: 2http://yann.lecun.com/exdb/mnist, 3http://qwone.com/~jason/20Newsgroups/, 4http://research.microsoft.com/en-us/um/people/manik/downloads/XC/XMLRepository. html). [12] Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. Multilabel text classiﬁcation for automated tag suggestion. In In: Proceedings of the ECML/PKDD-08 Workshop on Discovery Challenge, 2008.
Dataset Splits	No	Table 1 provides 'Training examples' and 'Test examples' for the datasets, but it does not explicitly mention or quantify a separate 'validation' split.
Hardware Specification	No	The paper mentions that 'full training is completed in just 26 minutes in a stand-alone PC' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup	Yes	We consider minibatches of size ten to approximate the sum Pn and subsets of remaining classes of size one to approximate Pm=yn. We used a learning rate initialized to 0.5/b (and then decrease it by a factor of 0.9 after each epoch) and performed 2 × 105 iterations. We applied OVE-SGD where at each stochastic gradient update we consider a single training instance (i.e. the minibatch size was one) and for that instance we randomly select ﬁve remaining classes. We used a very small learning rate having value 10−8 and we performed ﬁve epochs across the full dataset, that is we performed in total 5 × 1186239 stochastic gradient updates. After each epoch we halve the value of the learning rate before next epoch starts.