Jointly Learning to Label Sentences and Tokens
Authors: Marek Rei, Anders Søgaard6916-6923
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that by learning to perform these tasks jointly on multiple levels, the model achieves substantial improvements for both sentence classification and sequence labeling. |
| Researcher Affiliation | Academia | Marek Rei The ALTA Institute Computer Laboratory University of Cambridge United Kingdom marek.rei@cl.cam.ac.uk Anders Søgaard Co ASta L DIKU Department of Computer Science University of Copenhagen Denmark soegaard@di.ku.dk |
| Pseudocode | No | The paper describes the model using mathematical equations and textual descriptions but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for running these experiments will be made publicly available.1 1http://www.marekrei.com/projects/mltagger |
| Open Datasets | Yes | We evaluate the joint labeling framework on three different tasks and datasets. The Co NLL 2010 shared task (Farkas et al. 2010) dataset [...] For error detection on both levels, we use the First Certificate in English (FCE, Yannakoudakis, Briscoe, and Medlock (2011)) dataset [...] Finally, we convert the Stanford Sentiment Treebank (SST, Socher, Perelygin, and Wu (2013)) |
| Dataset Splits | No | The paper mentions using a 'development set' for early stopping and reports 'DEV F1' in its results tables, but it does not specify the exact percentages or sample counts for the training, validation, and test splits, nor does it explicitly reference how these splits were derived (e.g., standard predefined splits or custom splits with their proportions). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizers (Ada Delta) and pre-trained embeddings (Glove) but does not provide specific version numbers for any software libraries or frameworks used in implementation (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We combine all the different objective functions together using weighting parameters. [...] When using the full system, we use Λsent = Λtok = 1, ΛLM = Λchar = 0.1 and Λattn = 0.01. [...] Word embeddings were set to size 300, [...] The word-level LSTMs are size 300 and character-level LSTMs size 100; the hidden combined representation hi was set to size 200; the attention weight layer ei was set to size 100. The model was optimized using Ada Delta (Zeiler 2012) with learning rate 1.0. [...] Dropout (Srivastava et al. 2014) with probability 0.5 was applied [...] Training was stopped if performance on the development set had not improved for 7 epochs. |