reproducibilityindex.ai

Deep Active Learning for Named Entity Recognition

Authors: Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, Animashree Anandkumar

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25% of the original training data. 5 EXPERIMENTS
Researcher Affiliation	Collaboration	Yanyao Shen UT Austin Austin, TX 78712 shenyanyao@utexas.edu Hyokun Yun Amazon Web Services Seattle, WA 98101 yunhyoku@amazon.com Zachary C. Lipton Amazon Web Services Seattle, WA 98101 liptoz@amazon.com Yakov Kronrod Amazon Web Services Seattle, WA 98101 kronrod@amazon.com Animashree Anandkumar Amazon Web Services Seattle, WA 98101 anima@amazon.com
Pseudocode	Yes	Algorithm 1 Representativeness-based Sampling, Algorithm 2 Stream Submod Max
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	We use the Onto Notes-5.0 English and Chinese data (Pradhan et al., 2013) for our experiments. We use the Co NLL-2003 English (Tjong Kim Sang & De Meulder, 2003) for our experiments.
Dataset Splits	Yes	We use the standard split of training/validation/test sets, and use the validation set performance to determine hyperparameters such as the learning rate or the number of iterations for early stopping.
Hardware Specification	Yes	In terms of measuring the training speed of our models, we compute the time spent for one iteration of training on the dataset, with eight K80 GPUs in p2.8xlarge on Amazon Web Services2.
Software Dependencies	No	The paper mentions software components and methods like 'ReLU nonlinearities', 'dropout', 'word2vec', 'structured skip-gram model', and 'vanilla stochastic gradient descent', but it does not provide specific version numbers for any of the software libraries, frameworks, or programming languages used (e.g., PyTorch 1.9, TensorFlow 2.x, Python 3.x).
Experiment Setup	Yes	For LSTM word-level encoder, we use single-layer model with 100 hidden units for Co NLL-2003 English (following Lample et al. (2016)) and two-layer model with 300 hidden units for Onto Notes 5.0 datasets (following Chiu & Nichols (2016)). For character-level LSTM encoder, we use single-layer LSTM with 25 hidden units (following Lample et al. (2016)). For CNN word-level encoder, we use two-layer CNNs with 800 ﬁlters and kernel width 5, and for CNN character-level encoder, we use single-layer CNNs with 50 ﬁlters and kernel width 3 (following Chiu & Nichols (2016)). Dropout probabilities are all set as 0.5. We use vanilla stochastic gradient descent... We uniformly set the step size as 0.001 and the batch size as 128.