reproducibilityindex.ai

Weighted Training for Cross-Task Learning

Authors: Shuxiao Chen, Koby Crammer, Hangfeng He, Dan Roth, Weijie J Su

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The effectiveness of TAWT is corroborated through extensive experiments with BERT on four sequence tagging tasks in natural language processing (NLP), including part-of-speech (Po S) tagging, chunking, predicate detection, and named entity recognition (NER).
Researcher Affiliation	Academia	Shuxiao Chen University of Pennsylvania shuxiaoc@wharton.upenn.edu Koby Crammer The Technion koby@ee.technion.ac.il Hangfeng He University of Pennsylvania hangfeng@seas.upenn.edu Dan Roth University of Pennsylvania danroth@seas.upenn.edu Weijie J. Su University of Pennsylvania suw@wharton.upenn.edu
Pseudocode	Yes	Algorithm 1: Target-Aware Weighted Training (TAWT)
Open Source Code	Yes	Our code is publicly available at http://cogcomp.org/page/publication_view/963.
Open Datasets	Yes	In our experiments, we mainly use two widely-used NLP datasets, Ontontes 5.0 (Hovy et al., 2006) and Co NLL-2000 (Tjong Kim Sang & Buchholz, 2000).
Dataset Splits	Yes	There are about 116K sentences, 16K sentences, and 12K sentences in the training, development, and test sets for tasks in Ontonotes 5.0. As for Co NLL-2000, there are about 9K sentences and 2K sentences in the training and test sets.
Hardware Specification	Yes	It usually costs about half an hour to run the experiment for each setting (e.g. one number in Table 1) on one Ge Force RTX 2080 GPU.
Software Dependencies	No	Speciﬁcally, we use the pre-trained case-sensitive BERT-base Py Torch implementation (Wolf et al., 2020), and the common hyperparameters for BERT. ... the optimizer is Adam (Kingma & Ba, 2015). No specific version numbers for PyTorch or other libraries are provided.
Experiment Setup	Yes	Speciﬁcally, the max length is 128, the batch size is 32, the epoch number is 4, and the learning rate is 5e 5. ... In our experiments, we simply set the size of the randomly sampled subset of the training set as 64 ... In our experiments, we choose ηk = 1.0 in the mirror descent update (2.8).