Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
Authors: Wei Tan, Lan Du, Wray Buntine
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental evaluation then explores how these different acquisition functions perform. The results show that the use of mean square error and log probability with BEMPS yields robust acquisition functions, which consistently outperform the others tested. |
| Researcher Affiliation | Academia | Wei Tan Monash University wei.tan2@monash.edu Lan Du Monash University lan.du@monash.edu Wray Buntine Monash University wray.buntine@monash.edu |
| Pseudocode | Yes | Algorithm 1 Estimating point-wise Q(x|L, x ) with Equation (6), Algorithm 2 Estimate of argmaxx U Q(x|L), Algorithm 3 Finding a diverse batch |
| Open Source Code | Yes | Our implementation of BEMPS can be downloaded from https://github.com/davidtw999/BEMPS. |
| Open Datasets | Yes | We used four benchmark text datasets for three different classification tasks: topic classification, sentence classification, and sentiment analysis, as shown in Table 1. The AG NEWS for topic classification contains 120K texts of four balanced classes [41]. The PUBMED 20k was used for sentence classification [3], which contains about 20K medical abstracts with five categories. For sentiment analysis, we used both the SST-5 and the IMDB datasets. SST-5 contains 11K sentences extracted from movie reviews with five imbalanced sentiment labels [33], and IMDB contains 50K movie reviews with two balanced classes [18]. |
| Dataset Splits | Yes | Meanwhile, the initial training and validation split contain only 20 and 6 samples respectively. |
| Hardware Specification | Yes | All experiments were run on 8 Tesla 16GB V100 GPUs. |
| Software Dependencies | No | The paper mentions software like Distil BERT, Adam W, but does not specify their version numbers or other software dependencies with version information required for reproducibility. |
| Experiment Setup | Yes | We fine-tuned Distil BERT on each dataset after each AL iteration with a random re-initialization [5]... The maximum sequence length was set to 128, and a maximum of 30 epochs was used in fine-tuning Distil BERT with early stopping [4]. We used Adam W [15] as the optimizer with learning rate 2e-5 and betas 0.9/0.999. Each AL method was run for five times with different random number seeds on each dataset. The batch size B was set to {1, 5, 10, 50, 100}. |