Predicting and Analyzing Language Specificity in Social Media Posts
Authors: Yifan Gao, Yang Zhong, Daniel Preoţiuc-Pietro, Junyi Jessy Li6415-6422
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We collect a dataset of over 7,000 tweets annotated with specificity on a fine-grained scale. Using this dataset, we train a supervised regression model that accurately estimates specificity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, significantly improving over baselines and previous sentence specificity prediction systems. |
| Researcher Affiliation | Collaboration | Yifan Gao, Yang Zhong, Daniel Preot iuc-Pietro, Junyi Jessy Li Department of Mathematics, Department of Linguistics, The University of Texas at Austin {yifan233@,yang.zhong@,jessy@austin.}utexas.edu Bloomberg LP dpreotiucpie@bloomberg.net |
| Pseudocode | No | The paper describes the model and features but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our dataset and system are openly available online.2 https://github.com/cs329yangzhong/specificity Twitter |
| Open Datasets | Yes | To this end, we first introduce a large dataset of 7,267 tweets annotated with text specificity on a fine-grained scale of 1-5. ... Our dataset and system are openly available online.2 https://github.com/cs329yangzhong/specificity Twitter |
| Dataset Splits | Yes | We use the dataset described in Section 3 for training (5,767 examples), validation/development (500 examples) and testing (1,000 examples). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'Scikit-Learn', 'Stanford Name Entity Recognizer', and 'Stanford POS Tagger' but does not specify their version numbers. |
| Experiment Setup | Yes | Specifically, we use Support Vector Regression (SVR) with Radial Basis Function (RBF) kernel3 for sentence specificity prediction. ... The dimension for word embeddings and the number of Brown clusters are 100, tuned on validation set. |