Order-Free Learning Alleviating Exposure Bias in Multi-Label Classification
Authors: Che-Ping Tsai, Hung-Yi Lee6038-6045
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on three multi-label classification benchmark datasets show that our method outperforms competitive baselines by a large margin. Extensive experiments show that the proposed model outperforms competitive baselines by a large margin on three multi-label classification benchmark datasets, including two text classification and one sound event classification datasets. We compare our methods with competitive baseline models on three multi-label classification datasets and demonstrate the effectiveness of the proposed models. Experimental Setup We validate our proposed model on two multi-label text classification datasets, which are AAPD (Yang et al. 2018b) and Reuters-21758, and a sound event classification dataset, which is Audio set (Gemmeke et al. 2017) proposed by Google. Evaluation Metrics Multi-label classification can be evaluated with multiple metrics, which capture different aspects of the problem. We follow Nam et al. (2017) in using five different metrics: subset accuracy (ACC), Hamming accuracy (HA), examplebased F1 (eb F1), macro-averaged F1 (ma F1), and microaveraged F1 (mi F1). Results and Discussion In the following, we show results of the baseline models and the proposed method on three text datasets. |
| Researcher Affiliation | Academia | Che-Ping Tsai, Hung-Yi Lee Speech Processing and Machine Learning Laboratory, National Taiwan University {r06922039, hungyilee}@ntu.edu.tw |
| Pseudocode | No | No structured pseudocode or algorithm blocks are present in the paper. The methods are described through text and mathematical equations. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We validate our proposed model on two multi-label text classification datasets, which are AAPD (Yang et al. 2018b) and Reuters-21758, and a sound event classification dataset, which is Audio set (Gemmeke et al. 2017) proposed by Google. |
| Dataset Splits | Yes | In this experiment, since there are only 43 samples with unseen label combinations in original test set of AAPD, we resplited the AAPD dataset: 47840 samples in training set, 4000 samples for validation set and test set, respectively. Both validation set and test set contain 2000 samples whose label sets occur in the training set and 2000 samples are not. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper. |
| Experiment Setup | No | The paper mentions architectural components (e.g., bidirectional LSTM, LSTMs with attention, DNN with sigmoid activation) but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations in the main text. |