reproducibilityindex.ai

Order-Free Learning Alleviating Exposure Bias in Multi-Label Classification

Authors: Che-Ping Tsai, Hung-Yi Lee6038-6045

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results on three multi-label classiﬁcation benchmark datasets show that our method outperforms competitive baselines by a large margin. Extensive experiments show that the proposed model outperforms competitive baselines by a large margin on three multi-label classiﬁcation benchmark datasets, including two text classiﬁcation and one sound event classiﬁcation datasets. We compare our methods with competitive baseline models on three multi-label classiﬁcation datasets and demonstrate the effectiveness of the proposed models. Experimental Setup We validate our proposed model on two multi-label text classiﬁcation datasets, which are AAPD (Yang et al. 2018b) and Reuters-21758, and a sound event classiﬁcation dataset, which is Audio set (Gemmeke et al. 2017) proposed by Google. Evaluation Metrics Multi-label classiﬁcation can be evaluated with multiple metrics, which capture different aspects of the problem. We follow Nam et al. (2017) in using ﬁve different metrics: subset accuracy (ACC), Hamming accuracy (HA), examplebased F1 (eb F1), macro-averaged F1 (ma F1), and microaveraged F1 (mi F1). Results and Discussion In the following, we show results of the baseline models and the proposed method on three text datasets.
Researcher Affiliation	Academia	Che-Ping Tsai, Hung-Yi Lee Speech Processing and Machine Learning Laboratory, National Taiwan University {r06922039, hungyilee}@ntu.edu.tw
Pseudocode	No	No structured pseudocode or algorithm blocks are present in the paper. The methods are described through text and mathematical equations.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We validate our proposed model on two multi-label text classiﬁcation datasets, which are AAPD (Yang et al. 2018b) and Reuters-21758, and a sound event classiﬁcation dataset, which is Audio set (Gemmeke et al. 2017) proposed by Google.
Dataset Splits	Yes	In this experiment, since there are only 43 samples with unseen label combinations in original test set of AAPD, we resplited the AAPD dataset: 47840 samples in training set, 4000 samples for validation set and test set, respectively. Both validation set and test set contain 2000 samples whose label sets occur in the training set and 2000 samples are not.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the experiments are provided in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper.
Experiment Setup	No	The paper mentions architectural components (e.g., bidirectional LSTM, LSTMs with attention, DNN with sigmoid activation) but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations in the main text.