A Hierarchical Multi-Task Approach for Learning Embeddings from Semantic Tasks

Authors: Victor Sanh, Thomas Wolf, Sebastian Ruder6949-6956

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The hierarchical model and multi-task learning framework presented in this work achieved state-of-the-art results on three tasks, namely NER (+0.52), EMD (+3.8) and RE (+6.8).
Researcher Affiliation Collaboration Victor Sanh,1 Thomas Wolf,1 Sebastian Ruder2,3 1Hugging Face, 20 Jay Street, Brooklyn, New York, United States 2Insight Research Centre, National University of Ireland, Galway, Ireland 3Aylien Ltd., 2 Harmony Court, Harmony Row, Dublin, Ireland
Pseudocode No The paper describes the model architecture and training process in text and diagrams, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes For NER, we use the English portion of Onto Notes 5.0 (Pradhan et al. 2013)... For CR, EMD and RE, we use the Automatic Content Extraction (ACE) program ACE05 corpus (Doddington et al. 2004).
Dataset Splits Yes For CR, we use different splits to be able to compare to previous work (Bansal and Klein 2012; Durrett and Klein 2014). These splits (introduced in (Rahman and Ng 2009)) use the whole ACE05 dataset leaving 117 documents for test while having 482 documents for training (as in (Bansal and Klein 2012), we randomly split the training into a 70/30 ratio to form a validation set).
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or framework versions) used in the experiments.
Experiment Setup No The paper describes some training strategies like proportional sampling and fine-tuning embeddings but does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific optimizer settings for reproducibility.