Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Latent Support Measure Machines for Bag-of-Words Data Classification
Authors: Yuya Yoshikawa, Tomoharu Iwata, Hiroshi Sawada
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we show that the latent SMM achieves state-of-the-art accuracy for Bo W text classification, is robust with respect to its own hyper-parameters, and is useful to visualize words. |
| Researcher Affiliation | Collaboration | Yuya Yoshikawa Nara Institute of Science and Technology Nara, 630-0192, Japan EMAIL Tomoharu Iwata NTT Communication Science Laboratories Kyoto, 619-0237, Japan EMAIL Hiroshi Sawada NTT Service Evolution Laboratories Kanagawa, 239-0847, Japan EMAIL |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of the proposed method but does not include any pseudocode or explicitly labeled algorithm blocks. |
| Open Source Code | No | The paper refers to existing implementations for baseline methods (Med LDA, word2vec) and dataset sources, but does not state that the code for the proposed latent SMM is open-source or publicly available. |
| Open Datasets | Yes | For the evaluation, we used the following three standard multi-class text classification datasets: Web KB, Reuters-21578 and 20 Newsgroups. These datasets, which have already been preprocessed by removing short and stop words, are found in [19] and can be downloaded from the author s website1. |
| Dataset Splits | No | The paper states: 'Here we randomly chose five sets of training samples, and used the remaining samples for each of the training sets as the test set.' It describes a training and test split but does not explicitly mention a separate validation set or detail a cross-validation setup for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'LIBSVM' and 'the author's implementation of Med LDA' and 'word2vec', but does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | In our experiments, we choose the optimal parameters for these methods from the following variations: γ {10 3, 10 2, , 103} in the latent SMM, SVD+SMM, word2vec+SMM and SVM with a Gaussian RBF kernel, C {2 3, 2 1, , 25, 27} in all the methods, regularizer parameter ρ {10 2, 10 1, 100}, latent dimensionality q {2, 3, 4} in the latent SMM, and the latent dimensionality of Med LDA and SVD+SMM ranges {10, 20, , 50}. |