Inference-Masked Loss for Deep Structured Output Learning

Authors: Quan Guo, Hossein Rajaby Faghihi, Yue Zhang, Andrzej Uszok, Parisa Kordjamshidi

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show the inference-masked loss combined with the negative log-likelihood loss improves the performance on different tasks, namely entity relation recognition on Co NLL04 and ACE2005 corpora, and spatial role labeling on CLEF 2017 m Sp RL dataset. We evaluate the proposed method with two different tasks on three datasets under various settings to show the effectiveness of the proposed approach, especially with low training data.
Researcher Affiliation Academia Quan Guo1 , Hossein Rajaby Faghihi1 , Yue Zhang1 , Andrzej Uszok2 and Parisa Kordjamshidi1 1Michigan State University 2 Florida Institute for Human and Machine Cognition
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/HLR/ Inference-Masked-Loss
Open Datasets Yes Co NLL04 [Roth and Yih, 2004] is a publicly available corpus3 for ER. ... ACE 2005 Corpus (ACE2005) [Li and Ji, 2014]. ... CLEF 2017 m Sp RL dataset (Sp RL2017) [Kordjamshidi et al., 2017a; Kordjamshidi et al., 2017b].
Dataset Splits Yes We conducted five-fold cross-validation over all samples for the final evaluation. ... The training set contains 10, 360 sentences... The test set contains 2, 637 sentences... which has 600 sentences in the training set and 613 sentences in the testing set.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., specific GPU/CPU models, memory, or cloud instance types).
Software Dependencies No We implemented all the experiments using Pytorch4. ... We solve the ILP problems by Gurobi ... For ACE2005...FLAIR [Akbik et al., 2018]+GLo Ve [Pennington et al., 2014]... For spatial relation extraction, we concatenate the phrase encoding of the three spatial roles. Then, we use Recurrent Neural Networks and Logistic Regression... we first encode the input phrases with different linguistic features generated by Spa Cy5. (Software mentioned, but without specific version numbers for reproducibility).
Experiment Setup Yes IML (0.6) (i.e. with λ = 0.6) is used and optimized by Adam optimizer [Kingma and Ba, 2014] in batches of 8 examples. We train for 100 epochs with the learning rate of 1e 4, which is decayed 10 times every ten epochs. We use weight decay 1e 5 and a dropout rate of 0.35 to avoid over-fitting. Focal loss γ = 2 and label smoothing 0.01 are used for imbalanced class labels. ... The IML (0.5) (with λ = 0.5) is optimized by Adam optimizer. We train for 100 epochs, with the learning rate of 0.04. We use focal loss γ = 2 and label smoothing 0.01 to deal with imbalanced class labels. ... We trained IML (0.6) (λ = 0.6) by Adam for 20 epochs with a learning rate of 0.005, weight decay of 0.001 and dropout rate of 0.5 to avoid over-fitting.