Weighted Training for Cross-Task Learning
Authors: Shuxiao Chen, Koby Crammer, Hangfeng He, Dan Roth, Weijie J Su
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of TAWT is corroborated through extensive experiments with BERT on four sequence tagging tasks in natural language processing (NLP), including part-of-speech (Po S) tagging, chunking, predicate detection, and named entity recognition (NER). |
| Researcher Affiliation | Academia | Shuxiao Chen University of Pennsylvania shuxiaoc@wharton.upenn.edu Koby Crammer The Technion koby@ee.technion.ac.il Hangfeng He University of Pennsylvania hangfeng@seas.upenn.edu Dan Roth University of Pennsylvania danroth@seas.upenn.edu Weijie J. Su University of Pennsylvania suw@wharton.upenn.edu |
| Pseudocode | Yes | Algorithm 1: Target-Aware Weighted Training (TAWT) |
| Open Source Code | Yes | Our code is publicly available at http://cogcomp.org/page/publication_view/963. |
| Open Datasets | Yes | In our experiments, we mainly use two widely-used NLP datasets, Ontontes 5.0 (Hovy et al., 2006) and Co NLL-2000 (Tjong Kim Sang & Buchholz, 2000). |
| Dataset Splits | Yes | There are about 116K sentences, 16K sentences, and 12K sentences in the training, development, and test sets for tasks in Ontonotes 5.0. As for Co NLL-2000, there are about 9K sentences and 2K sentences in the training and test sets. |
| Hardware Specification | Yes | It usually costs about half an hour to run the experiment for each setting (e.g. one number in Table 1) on one Ge Force RTX 2080 GPU. |
| Software Dependencies | No | Specifically, we use the pre-trained case-sensitive BERT-base Py Torch implementation (Wolf et al., 2020), and the common hyperparameters for BERT. ... the optimizer is Adam (Kingma & Ba, 2015). No specific version numbers for PyTorch or other libraries are provided. |
| Experiment Setup | Yes | Specifically, the max length is 128, the batch size is 32, the epoch number is 4, and the learning rate is 5e 5. ... In our experiments, we simply set the size of the randomly sampled subset of the training set as 64 ... In our experiments, we choose ηk = 1.0 in the mirror descent update (2.8). |