Distributional Semantics Meets Multi-Label Learning

Authors: Vivek Gupta, Rahul Wadbude, Nagarajan Natarajan, Harish Karnick, Prateek Jain, Piyush Rai3747-3754

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed models perform favorably as compared to state-of-the-art methods for large-scale multi-label learning.
Researcher Affiliation Collaboration 1School of Computing, University of Utah, 2Computer Science Department, IIT Kanpur 3Microsoft Research Lab, Bangalore
Pseudocode Yes Our algorithm for predicting the labels of a new instance is identical to that of SLEEC and is presented for convenience in Algorithm 1. ... Algorithm 2 Learning embeddings via SPPMI factorization (EXMLDS1). ... Algorithm 3 Learning joint label and instance embeddings via SPPMI factorization (EXMLDS3). ... Algorithm 4 Prediction Algorithm with Label Correlations (EXMLDS3 prediction). ... Algorithm 5 Learning joint instance embeddings and regression via gradient decent (EXMLDS4).
Open Source Code No Source code will be made available to public later.
Open Datasets Yes We conduct experiments on commonly used benchmark datasets from the extreme multi-label classification repository provided by the authors of (Prabhu and Varma 2014; Bhatia et al. 2015) 2; these datasets are pre-processed, and have prescribed train-test splits. ... 2 Datasets and Benchmark :https://bit.ly/2IDt Qb S
Dataset Splits Yes We conduct experiments on commonly used benchmark datasets from the extreme multi-label classification repository provided by the authors of (Prabhu and Varma 2014; Bhatia et al. 2015) 2; these datasets are pre-processed, and have prescribed train-test splits. ... For small datasets, we fix negative sample size to 15 and number of iterations to 35 during neural network training, tuned based on a separate validation set. For large datasets, we fix negative sample size to 2 and number of iterations to 5, tuned on a validation set.
Hardware Specification No The paper mentions 'a Linux machine with 40 cores and 128 GB RAM' but does not specify the exact CPU model or other detailed hardware components required for replication.
Software Dependencies No The paper states 'Learning Algorithms 2 and 3 are implemented partly in Python and partly in MATLAB' but does not provide specific version numbers for these software packages or any other dependencies.
Experiment Setup Yes For small datasets, we fix negative sample size to 15 and number of iterations to 35 during neural network training, tuned based on a separate validation set. For large datasets, we fix negative sample size to 2 and number of iterations to 5, tuned on a validation set. ... We use the same embedding dimensionality, preserve the same number of nearest neighbors for learning embeddings as well as at prediction time, and the same number of data partitions used in SLEEC (Bhatia et al. 2015) for our method EXMLDS1and EXMLDS2. ... embedding size as 50, number of learner for each cluster as 15, number of nearest neighbor as 10, number of embedding and partitioning iteration both 100, gamma as 1, label normalization as true, number of threads as 32.