Domain Agnostic Real-Valued Specificity Prediction
Authors: Wei-Jen Ko, Greg Durrett, Junyi Jessy Li6610-6617
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our system generates more accurate real-valued sentence specificity predictions that correlate well with human judgment, across three domains that are vastly different from the source domain (news): Twitter, Yelp reviews and movie reviews. |
| Researcher Affiliation | Academia | Wei-Jen Ko, Greg Durrett, Junyi Jessy Li Department of Computer Science Department of Linguistics The University of Texas at Austin wjko@cs.utexas.edu, gdurrett@cs.utexas.edu, jessy@austin.utexas.edu |
| Pseudocode | No | The paper includes figures illustrating the model architecture but does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | An unsupervised domain adaptation framework for sentence specificity prediction, available at https://github.com/wjko2/Domain-Agnostic-Sentence Specificity-Prediction |
| Open Datasets | Yes | The source domain for sentence specificity is news, for which we use three publicly available labeled datasets: (1) training sentences from Louis and Nenkova (2011a) and Li and Nenkova (2015)... (2) 900 news sentences crowdsourced for binary general/specific labels (Louis and Nenkova 2012); (3) 543 news sentences from Li et al. (2016b). |
| Dataset Splits | Yes | Hyperparameters are tuned on a validation set of 200 tweets that doesn t overlap with the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Open NMT' but does not specify a version number for it or other software dependencies. |
| Experiment Setup | Yes | The LSTM encoder generates 100-dimensional representations. For the multilayer perceptron, we use 3 fully connected 100-dimensional layers. We use Re LU activation with batch normalization. For the Gaussian noise in data augmentation, we use standard deviation 0.1 for word embeddings and 0.2 for shallow features. The probabilities of deleting a word and replacing a word vector are 0.15. The exponential moving average decay α is 0.999. Dropout rate is 0.5 for all layers. The batch size is 32. c1 = 1000, c2 = 10 for KL loss and 100 for mean and std.dev loss. β = 1. We fix the number of training to be 30 epochs for SE+A and SE+AD, 10 epochs for SE, and 15 epochs for SE+D. We use the Adam optimizer with learning rate 0.0001, β1 = 0.9, β2 = 0.999. |