reproducibilityindex.ai

Learning Uncertainty for Unknown Domains with Zero-Target-Assumption

Authors: Yu Yu, Hassan Sajjad, Jia Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our MERRL using regularized A2C and SAC achieves up to -99.7 perplexity decrease (-43.4% relatively) in language modeling, +25.0 accuracy increase (+40.0% relatively) in sentiment analysis, and +5.0 F1 score increase (+30.8% relatively) in named entity recognition over various domains, demonstrating strong generalization power on unknown test sets.
Researcher Affiliation	Academia	*School of Engineering and Science, Stevens Institute of Technology Faculty of Computer Science, Dalhousie University yyu50@stevens.edu, hsajjad@dal.ca, jxu70@stevens.edu
Pseudocode	Yes	Algorithm 1 N-gram set entropy
Open Source Code	No	The paper does not provide any specific repository link or an explicit statement about releasing the source code for the described methodology.
Open Datasets	Yes	We use the Amazon product review dataset (Blitzer et al., 2007) for the sentiment analysis task. [...] We use the Co NLL2003 English NER dataset (Sang & Meulder, 2003) as an in-domain training set [...] We experiment with two moderate size datasets Wiki Text-2 (Merity et al., 2016) and Penn Treebank.
Dataset Splits	No	The paper mentions 'in-domain validation perplexity score' for language modeling, implying a validation set, but does not provide explicit sizes or ratios for training, validation, and test splits across any of the datasets. For sentiment analysis and NER, it refers to using certain datasets for training and others for testing without detailing internal splits.
Hardware Specification	Yes	In practice, training with all in-domain data in sentiment analysis takes 131 seconds while selecting data with SAC-OE takes 1394 seconds (Tbudget = 2000 and T = 140) on one Tesla V100 GPU, which is roughly ten times faster than the baseline VPG to achieve similar average reward.
Software Dependencies	No	The paper mentions software like 'fairseq toolkit' and 'BERT model' but does not provide specific version numbers for these or other underlying software dependencies.
Experiment Setup	Yes	Hyperparameter Value learning rate 7e-4 discount factor 0.99 entropy coefficient 0.001 value function coefficient 0.5 RMSProp epsilon 1e-5 number of steps (Tbudget) 10000 batch size (NER) 100 batch size (sentiment) 500 batch size (language modeling) 500