Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping
Authors: Shashanka Ubaru, Sanjeeb Dash, Arya Mazumdar, Oktay Gunluk
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results on many benchmark datasets illustrate that, compared to other popular methods, our proposed methods achieve competitive accuracy with significantly lower computational costs. ... We now present numerical results to illustrate the performance of the proposed approaches (the data-dependent construction NMF-GT and with hierarchical partitioning He-NMFGT) on MLC problems. |
| Researcher Affiliation | Collaboration | Shashanka Ubaru IBM Thomas J. Watson Research Center Yorktown Heights, NY, USA Sanjeeb Dash IBM Thomas J. Watson Research Center Yorktown Heights, NY, USA Arya Mazumdar Department of Computer Science University of Massachusetts, Amherst, MA Oktay Gunluk Operations Research and Information Engg Cornell Universty, Ithaca, NY |
| Pseudocode | Yes | Algorithm 1 MLGT: Training Algorithm Input: Training data {(xi, yi)}n i=1, group testing matrix A Rm d, binary classifier C. Output: m classifiers {wj}m j=1. for i = 1, . . . , n. do zi = A yi. end for for j = 1, . . . , m. do wj = C({(xi, (zi)j)}n i=1). end for Algorithm 2 MLGT: Prediction Algorithm Input: Test data x Rp, the group testing matrix A Rm d, m classifiers {wj}m j=1, sparsity k. Output: predicted label ˆy. for j = 1, . . . , m. do ˆz(j) = wj(x). end for ˆy = fast-decode(A, ˆz, k). |
| Open Source Code | Yes | The code for our method is publicly available at https: //github.com/Shashankaubaru/He-NMFGT. |
| Open Datasets | Yes | Datasets: For our experiments, we consider some of the popular publicly available multilabel datasets put together in The Extreme Classification Repository [5] (http: //manikvarma.org/downloads/ XC/XMLRepository.html). |
| Dataset Splits | No | The paper provides training and test set sizes in Table 1 but does not explicitly mention a separate validation set or describe how data was split into train/test/validation for reproducibility. It implicitly uses a validation set for choosing parameter 'c' (Remark 1), but doesn't detail the split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions that the implementation was in Matlab and suggests potential speed improvements. |
| Software Dependencies | No | The paper mentions that "He-NMFGT was implemented in Matlab" but does not provide specific version numbers for Matlab or any other libraries or software used. |
| Experiment Setup | Yes | Remark 1 (Choosing c). In these constructions, we choose the parameter c (the column sparsity or the number of ones per column) parameter using a simple procedure. For a range of cs we form the matrix A, reduce and recover (a random subset of) training label vectors, and choose the c which yields the smallest Hamming loss error. ... The no. of groups m used in NMFGT and no. of blocks ℓused in He-NMFGT are also given. ... The precision results and the runtimes for the four additional methods were obtained from [28, 40]. |