Learning Uncertainty for Unknown Domains with Zero-Target-Assumption
Authors: Yu Yu, Hassan Sajjad, Jia Xu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our MERRL using regularized A2C and SAC achieves up to -99.7 perplexity decrease (-43.4% relatively) in language modeling, +25.0 accuracy increase (+40.0% relatively) in sentiment analysis, and +5.0 F1 score increase (+30.8% relatively) in named entity recognition over various domains, demonstrating strong generalization power on unknown test sets. |
| Researcher Affiliation | Academia | *School of Engineering and Science, Stevens Institute of Technology Faculty of Computer Science, Dalhousie University yyu50@stevens.edu, hsajjad@dal.ca, jxu70@stevens.edu |
| Pseudocode | Yes | Algorithm 1 N-gram set entropy |
| Open Source Code | No | The paper does not provide any specific repository link or an explicit statement about releasing the source code for the described methodology. |
| Open Datasets | Yes | We use the Amazon product review dataset (Blitzer et al., 2007) for the sentiment analysis task. [...] We use the Co NLL2003 English NER dataset (Sang & Meulder, 2003) as an in-domain training set [...] We experiment with two moderate size datasets Wiki Text-2 (Merity et al., 2016) and Penn Treebank. |
| Dataset Splits | No | The paper mentions 'in-domain validation perplexity score' for language modeling, implying a validation set, but does not provide explicit sizes or ratios for training, validation, and test splits across any of the datasets. For sentiment analysis and NER, it refers to using certain datasets for training and others for testing without detailing internal splits. |
| Hardware Specification | Yes | In practice, training with all in-domain data in sentiment analysis takes 131 seconds while selecting data with SAC-OE takes 1394 seconds (Tbudget = 2000 and T = 140) on one Tesla V100 GPU, which is roughly ten times faster than the baseline VPG to achieve similar average reward. |
| Software Dependencies | No | The paper mentions software like 'fairseq toolkit' and 'BERT model' but does not provide specific version numbers for these or other underlying software dependencies. |
| Experiment Setup | Yes | Hyperparameter Value learning rate 7e-4 discount factor 0.99 entropy coefficient 0.001 value function coefficient 0.5 RMSProp epsilon 1e-5 number of steps (Tbudget) 10000 batch size (NER) 100 batch size (sentiment) 500 batch size (language modeling) 500 |