Modeling Label Space Interactions in Multi-label Classification using Box Embeddings
Authors: Dhruvesh Patel, Pavitra Dangati, Jay-Yoon Lee, Michael Boratko, Andrew McCallum
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive empirical evaluations on twelve multi-label classification datasets, we show that MBM can significantly improve taxonomic consistency while maintaining the state-of-the-art predictive performance. |
| Researcher Affiliation | Academia | Dhruvesh Patel, Pavitra Dangati, Jay-Yoon Lee, Michael Boratko, Andrew Mc Callum Manning College of Information & Computer Sciences University of Massachusetts Amherst {dhruveshpate, sdangati, jaylee, mboratko, mccallum}@cs.umass.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and implementation details are available at https://github.com/iesl/box-mlciclr-2022 |
| Open Datasets | Yes | The description of the datasets with various statistics, links to download them, and instructions to pre-process them are provided in Appendix B. [...] Table 7: The table provides the links to download the data from original source. |
| Dataset Splits | Yes | Table 6: Summary of the datasets used in experiments. The feature based multi-label datasets span across 3 domains: functional genomics, image and text. ... (columns: Train, Val, Test with specific instance counts for each dataset) |
| Hardware Specification | Yes | For datasets with number of labels less than 500, i.e., the 4 FUNCAT datasets, Imclef07a, Imclef07d, Diatoms and Enron, all the models were trained on Titan X GPU (memory=12GB). For the 4 GO datasets that have number of labels greater than 4000, all the models are trained on M40 GPU (memory=24GB). |
| Software Dependencies | No | The paper mentions using PyTorch, AllenNLP, and the Box Embedding library, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | The input encoder Fθ uses a common architecture for all models consisting of an MLP with a maximum of 3 layers. We perform a grid search over number of MLP layers, activation function, hidden dimensions, dropout, learning rate and use the best parameters for each model. ... Table 4 presents the final hyper-parameters obtained. |