reproducibilityindex.ai

Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping

Authors: Shashanka Ubaru, Sanjeeb Dash, Arya Mazumdar, Oktay Gunluk

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical results on many benchmark datasets illustrate that, compared to other popular methods, our proposed methods achieve competitive accuracy with significantly lower computational costs. ... We now present numerical results to illustrate the performance of the proposed approaches (the data-dependent construction NMF-GT and with hierarchical partitioning He-NMFGT) on MLC problems.
Researcher Affiliation	Collaboration	Shashanka Ubaru IBM Thomas J. Watson Research Center Yorktown Heights, NY, USA Sanjeeb Dash IBM Thomas J. Watson Research Center Yorktown Heights, NY, USA Arya Mazumdar Department of Computer Science University of Massachusetts, Amherst, MA Oktay Gunluk Operations Research and Information Engg Cornell Universty, Ithaca, NY
Pseudocode	Yes	Algorithm 1 MLGT: Training Algorithm Input: Training data {(xi, yi)}n i=1, group testing matrix A Rm d, binary classifier C. Output: m classifiers {wj}m j=1. for i = 1, . . . , n. do zi = A yi. end for for j = 1, . . . , m. do wj = C({(xi, (zi)j)}n i=1). end for Algorithm 2 MLGT: Prediction Algorithm Input: Test data x Rp, the group testing matrix A Rm d, m classifiers {wj}m j=1, sparsity k. Output: predicted label ˆy. for j = 1, . . . , m. do ˆz(j) = wj(x). end for ˆy = fast-decode(A, ˆz, k).
Open Source Code	Yes	The code for our method is publicly available at https: //github.com/Shashankaubaru/He-NMFGT.
Open Datasets	Yes	Datasets: For our experiments, we consider some of the popular publicly available multilabel datasets put together in The Extreme Classification Repository [5] (http: //manikvarma.org/downloads/ XC/XMLRepository.html).
Dataset Splits	No	The paper provides training and test set sizes in Table 1 but does not explicitly mention a separate validation set or describe how data was split into train/test/validation for reproducibility. It implicitly uses a validation set for choosing parameter 'c' (Remark 1), but doesn't detail the split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions that the implementation was in Matlab and suggests potential speed improvements.
Software Dependencies	No	The paper mentions that "He-NMFGT was implemented in Matlab" but does not provide specific version numbers for Matlab or any other libraries or software used.
Experiment Setup	Yes	Remark 1 (Choosing c). In these constructions, we choose the parameter c (the column sparsity or the number of ones per column) parameter using a simple procedure. For a range of cs we form the matrix A, reduce and recover (a random subset of) training label vectors, and choose the c which yields the smallest Hamming loss error. ... The no. of groups m used in NMFGT and no. of blocks ℓused in He-NMFGT are also given. ... The precision results and the runtimes for the four additional methods were obtained from [28, 40].