Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities
Authors: Michalis Titsias RC AUEB
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the new bound has interesting theoretical properties and we demonstrate its use in classification problems. Figure 1 shows some estimated softmax probabilities, using a dataset of 200 points each taking one out of ten values... Here, we consider AMAZONCAT-13K... which is a large scale classification dataset. |
| Researcher Affiliation | Academia | Michalis K. Titsias Department of Informatics Athens University of Economics and Business EMAIL |
| Pseudocode | No | The paper provides mathematical derivations and explanations but does not include pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not mention providing access to source code for the described methodology. |
| Open Datasets | Yes | MNIST2, 20NEWS3 and BIBTEX [12]; see Table 1 for details. (Footnotes 2, 3, 4 provide URLs: 2http://yann.lecun.com/exdb/mnist, 3http://qwone.com/~jason/20Newsgroups/, 4http://research.microsoft.com/en-us/um/people/manik/downloads/XC/XMLRepository. html). [12] Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. Multilabel text classification for automated tag suggestion. In In: Proceedings of the ECML/PKDD-08 Workshop on Discovery Challenge, 2008. |
| Dataset Splits | No | Table 1 provides 'Training examples' and 'Test examples' for the datasets, but it does not explicitly mention or quantify a separate 'validation' split. |
| Hardware Specification | No | The paper mentions that 'full training is completed in just 26 minutes in a stand-alone PC' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | We consider minibatches of size ten to approximate the sum Pn and subsets of remaining classes of size one to approximate Pm=yn. We used a learning rate initialized to 0.5/b (and then decrease it by a factor of 0.9 after each epoch) and performed 2 × 105 iterations. We applied OVE-SGD where at each stochastic gradient update we consider a single training instance (i.e. the minibatch size was one) and for that instance we randomly select five remaining classes. We used a very small learning rate having value 10−8 and we performed five epochs across the full dataset, that is we performed in total 5 × 1186239 stochastic gradient updates. After each epoch we halve the value of the learning rate before next epoch starts. |