Large-scale Multi-label Learning with Missing Labels
Authors: Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, Inderjit Dhillon
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present extensive empirical results on a variety of benchmark datasets and show that our methods perform significantly better than existing label compression based methods and can scale up to very large datasets such as a Wikipedia dataset that has more than 200,000 labels. |
| Researcher Affiliation | Collaboration | Hsiang-Fu Yu ROFUYU@CS.UTEXAS.EDU Department of Computer Science, University of Texas at Austin Prateek Jain PRAJAIN@MICROSOFT.COM Purushottam Kar T-PURKAR@MICROSOFT.COM Microsoft Research India, Bangalore Inderjit S. Dhillon INDERJIT@CS.UTEXAS.EDU Department of Computer Science, University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1 General Loss with Missing Labels... Algorithm 2 Squared Loss with Full Labels |
| Open Source Code | No | The paper does not include an unambiguous statement or a direct link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We considered a variety of benchmark datasets including four standard datasets (bibtex, delicious, eurlex, and nus-wide), two datasets with d L (autofood and compphys), and a very large scale Wikipedia based dataset, which contains about 1M wikipages and 200K labels. See Table 1 for more information about the datasets. [...] Finally, we show the scalability of our method by applying it to a recently curated Wikipedia dataset (Agrawal et al., 2013), that has 881,805 training samples and 213,707 labels. |
| Dataset Splits | No | The paper mentions 'Training set' and 'Test set' in Table 1 but does not explicitly specify the methodology or percentages for splitting data into training, validation, and test sets to ensure reproducibility beyond general dataset names. |
| Hardware Specification | Yes | We conducted all experiments on an Intel machine with 32 cores. |
| Software Dependencies | No | The paper mentions software tools like LIBLINEAR and TRON, but it does not provide specific version numbers for these or any other ancillary software components required for replication. |
| Experiment Setup | No | While the paper describes algorithmic choices and reports the number of alternating iterations (five), it does not explicitly provide specific hyperparameter values (e.g., learning rates, batch sizes, regularization strengths) or detailed training configurations for its own methods to enable reproduction. |