Deep Active Learning for Named Entity Recognition

Authors: Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, Animashree Anandkumar

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25% of the original training data. 5 EXPERIMENTS
Researcher Affiliation Collaboration Yanyao Shen UT Austin Austin, TX 78712 shenyanyao@utexas.edu Hyokun Yun Amazon Web Services Seattle, WA 98101 yunhyoku@amazon.com Zachary C. Lipton Amazon Web Services Seattle, WA 98101 liptoz@amazon.com Yakov Kronrod Amazon Web Services Seattle, WA 98101 kronrod@amazon.com Animashree Anandkumar Amazon Web Services Seattle, WA 98101 anima@amazon.com
Pseudocode Yes Algorithm 1 Representativeness-based Sampling, Algorithm 2 Stream Submod Max
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We use the Onto Notes-5.0 English and Chinese data (Pradhan et al., 2013) for our experiments. We use the Co NLL-2003 English (Tjong Kim Sang & De Meulder, 2003) for our experiments.
Dataset Splits Yes We use the standard split of training/validation/test sets, and use the validation set performance to determine hyperparameters such as the learning rate or the number of iterations for early stopping.
Hardware Specification Yes In terms of measuring the training speed of our models, we compute the time spent for one iteration of training on the dataset, with eight K80 GPUs in p2.8xlarge on Amazon Web Services2.
Software Dependencies No The paper mentions software components and methods like 'ReLU nonlinearities', 'dropout', 'word2vec', 'structured skip-gram model', and 'vanilla stochastic gradient descent', but it does not provide specific version numbers for any of the software libraries, frameworks, or programming languages used (e.g., PyTorch 1.9, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes For LSTM word-level encoder, we use single-layer model with 100 hidden units for Co NLL-2003 English (following Lample et al. (2016)) and two-layer model with 300 hidden units for Onto Notes 5.0 datasets (following Chiu & Nichols (2016)). For character-level LSTM encoder, we use single-layer LSTM with 25 hidden units (following Lample et al. (2016)). For CNN word-level encoder, we use two-layer CNNs with 800 filters and kernel width 5, and for CNN character-level encoder, we use single-layer CNNs with 50 filters and kernel width 3 (following Chiu & Nichols (2016)). Dropout probabilities are all set as 0.5. We use vanilla stochastic gradient descent... We uniformly set the step size as 0.001 and the batch size as 128.