Modeling Label Space Interactions in Multi-label Classification using Box Embeddings

Authors: Dhruvesh Patel, Pavitra Dangati, Jay-Yoon Lee, Michael Boratko, Andrew McCallum

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive empirical evaluations on twelve multi-label classification datasets, we show that MBM can significantly improve taxonomic consistency while maintaining the state-of-the-art predictive performance.
Researcher Affiliation Academia Dhruvesh Patel, Pavitra Dangati, Jay-Yoon Lee, Michael Boratko, Andrew Mc Callum Manning College of Information & Computer Sciences University of Massachusetts Amherst {dhruveshpate, sdangati, jaylee, mboratko, mccallum}@cs.umass.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code and implementation details are available at https://github.com/iesl/box-mlciclr-2022
Open Datasets Yes The description of the datasets with various statistics, links to download them, and instructions to pre-process them are provided in Appendix B. [...] Table 7: The table provides the links to download the data from original source.
Dataset Splits Yes Table 6: Summary of the datasets used in experiments. The feature based multi-label datasets span across 3 domains: functional genomics, image and text. ... (columns: Train, Val, Test with specific instance counts for each dataset)
Hardware Specification Yes For datasets with number of labels less than 500, i.e., the 4 FUNCAT datasets, Imclef07a, Imclef07d, Diatoms and Enron, all the models were trained on Titan X GPU (memory=12GB). For the 4 GO datasets that have number of labels greater than 4000, all the models are trained on M40 GPU (memory=24GB).
Software Dependencies No The paper mentions using PyTorch, AllenNLP, and the Box Embedding library, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes The input encoder Fθ uses a common architecture for all models consisting of an MLP with a maximum of 3 layers. We perform a grid search over number of MLP layers, activation function, hidden dimensions, dropout, learning rate and use the best parameters for each model. ... Table 4 presents the final hyper-parameters obtained.