Domain Agnostic Real-Valued Specificity Prediction

Authors: Wei-Jen Ko, Greg Durrett, Junyi Jessy Li6610-6617

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our system generates more accurate real-valued sentence specificity predictions that correlate well with human judgment, across three domains that are vastly different from the source domain (news): Twitter, Yelp reviews and movie reviews.
Researcher Affiliation Academia Wei-Jen Ko, Greg Durrett, Junyi Jessy Li Department of Computer Science Department of Linguistics The University of Texas at Austin wjko@cs.utexas.edu, gdurrett@cs.utexas.edu, jessy@austin.utexas.edu
Pseudocode No The paper includes figures illustrating the model architecture but does not contain explicit pseudocode or algorithm blocks.
Open Source Code Yes An unsupervised domain adaptation framework for sentence specificity prediction, available at https://github.com/wjko2/Domain-Agnostic-Sentence Specificity-Prediction
Open Datasets Yes The source domain for sentence specificity is news, for which we use three publicly available labeled datasets: (1) training sentences from Louis and Nenkova (2011a) and Li and Nenkova (2015)... (2) 900 news sentences crowdsourced for binary general/specific labels (Louis and Nenkova 2012); (3) 543 news sentences from Li et al. (2016b).
Dataset Splits Yes Hyperparameters are tuned on a validation set of 200 tweets that doesn t overlap with the test set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using 'Open NMT' but does not specify a version number for it or other software dependencies.
Experiment Setup Yes The LSTM encoder generates 100-dimensional representations. For the multilayer perceptron, we use 3 fully connected 100-dimensional layers. We use Re LU activation with batch normalization. For the Gaussian noise in data augmentation, we use standard deviation 0.1 for word embeddings and 0.2 for shallow features. The probabilities of deleting a word and replacing a word vector are 0.15. The exponential moving average decay α is 0.999. Dropout rate is 0.5 for all layers. The batch size is 32. c1 = 1000, c2 = 10 for KL loss and 100 for mean and std.dev loss. β = 1. We fix the number of training to be 30 epochs for SE+A and SE+AD, 10 epochs for SE, and 15 epochs for SE+D. We use the Adam optimizer with learning rate 0.0001, β1 = 0.9, β2 = 0.999.