Learning Uncertainty for Unknown Domains with Zero-Target-Assumption

Authors: Yu Yu, Hassan Sajjad, Jia Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our MERRL using regularized A2C and SAC achieves up to -99.7 perplexity decrease (-43.4% relatively) in language modeling, +25.0 accuracy increase (+40.0% relatively) in sentiment analysis, and +5.0 F1 score increase (+30.8% relatively) in named entity recognition over various domains, demonstrating strong generalization power on unknown test sets.
Researcher Affiliation Academia *School of Engineering and Science, Stevens Institute of Technology Faculty of Computer Science, Dalhousie University yyu50@stevens.edu, hsajjad@dal.ca, jxu70@stevens.edu
Pseudocode Yes Algorithm 1 N-gram set entropy
Open Source Code No The paper does not provide any specific repository link or an explicit statement about releasing the source code for the described methodology.
Open Datasets Yes We use the Amazon product review dataset (Blitzer et al., 2007) for the sentiment analysis task. [...] We use the Co NLL2003 English NER dataset (Sang & Meulder, 2003) as an in-domain training set [...] We experiment with two moderate size datasets Wiki Text-2 (Merity et al., 2016) and Penn Treebank.
Dataset Splits No The paper mentions 'in-domain validation perplexity score' for language modeling, implying a validation set, but does not provide explicit sizes or ratios for training, validation, and test splits across any of the datasets. For sentiment analysis and NER, it refers to using certain datasets for training and others for testing without detailing internal splits.
Hardware Specification Yes In practice, training with all in-domain data in sentiment analysis takes 131 seconds while selecting data with SAC-OE takes 1394 seconds (Tbudget = 2000 and T = 140) on one Tesla V100 GPU, which is roughly ten times faster than the baseline VPG to achieve similar average reward.
Software Dependencies No The paper mentions software like 'fairseq toolkit' and 'BERT model' but does not provide specific version numbers for these or other underlying software dependencies.
Experiment Setup Yes Hyperparameter Value learning rate 7e-4 discount factor 0.99 entropy coefficient 0.001 value function coefficient 0.5 RMSProp epsilon 1e-5 number of steps (Tbudget) 10000 batch size (NER) 100 batch size (sentiment) 500 batch size (language modeling) 500