Order-Free Learning Alleviating Exposure Bias in Multi-Label Classification

Authors: Che-Ping Tsai, Hung-Yi Lee6038-6045

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on three multi-label classification benchmark datasets show that our method outperforms competitive baselines by a large margin. Extensive experiments show that the proposed model outperforms competitive baselines by a large margin on three multi-label classification benchmark datasets, including two text classification and one sound event classification datasets. We compare our methods with competitive baseline models on three multi-label classification datasets and demonstrate the effectiveness of the proposed models. Experimental Setup We validate our proposed model on two multi-label text classification datasets, which are AAPD (Yang et al. 2018b) and Reuters-21758, and a sound event classification dataset, which is Audio set (Gemmeke et al. 2017) proposed by Google. Evaluation Metrics Multi-label classification can be evaluated with multiple metrics, which capture different aspects of the problem. We follow Nam et al. (2017) in using five different metrics: subset accuracy (ACC), Hamming accuracy (HA), examplebased F1 (eb F1), macro-averaged F1 (ma F1), and microaveraged F1 (mi F1). Results and Discussion In the following, we show results of the baseline models and the proposed method on three text datasets.
Researcher Affiliation Academia Che-Ping Tsai, Hung-Yi Lee Speech Processing and Machine Learning Laboratory, National Taiwan University {r06922039, hungyilee}@ntu.edu.tw
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper. The methods are described through text and mathematical equations.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We validate our proposed model on two multi-label text classification datasets, which are AAPD (Yang et al. 2018b) and Reuters-21758, and a sound event classification dataset, which is Audio set (Gemmeke et al. 2017) proposed by Google.
Dataset Splits Yes In this experiment, since there are only 43 samples with unseen label combinations in original test set of AAPD, we resplited the AAPD dataset: 47840 samples in training set, 4000 samples for validation set and test set, respectively. Both validation set and test set contain 2000 samples whose label sets occur in the training set and 2000 samples are not.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the experiments are provided in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper.
Experiment Setup No The paper mentions architectural components (e.g., bidirectional LSTM, LSTMs with attention, DNN with sigmoid activation) but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations in the main text.