Predicting and Analyzing Language Specificity in Social Media Posts

Authors: Yifan Gao, Yang Zhong, Daniel Preoţiuc-Pietro, Junyi Jessy Li6415-6422

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We collect a dataset of over 7,000 tweets annotated with specificity on a fine-grained scale. Using this dataset, we train a supervised regression model that accurately estimates specificity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, significantly improving over baselines and previous sentence specificity prediction systems.
Researcher Affiliation Collaboration Yifan Gao, Yang Zhong, Daniel Preot iuc-Pietro, Junyi Jessy Li Department of Mathematics, Department of Linguistics, The University of Texas at Austin {yifan233@,yang.zhong@,jessy@austin.}utexas.edu Bloomberg LP dpreotiucpie@bloomberg.net
Pseudocode No The paper describes the model and features but does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our dataset and system are openly available online.2 https://github.com/cs329yangzhong/specificity Twitter
Open Datasets Yes To this end, we first introduce a large dataset of 7,267 tweets annotated with text specificity on a fine-grained scale of 1-5. ... Our dataset and system are openly available online.2 https://github.com/cs329yangzhong/specificity Twitter
Dataset Splits Yes We use the dataset described in Section 3 for training (5,767 examples), validation/development (500 examples) and testing (1,000 examples).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like 'Scikit-Learn', 'Stanford Name Entity Recognizer', and 'Stanford POS Tagger' but does not specify their version numbers.
Experiment Setup Yes Specifically, we use Support Vector Regression (SVR) with Radial Basis Function (RBF) kernel3 for sentence specificity prediction. ... The dimension for word embeddings and the number of Brown clusters are 100, tuned on validation set.