Unsupervised Learning for Lexicon-Based Classification

Authors: Jacob Eisenstein

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical evaluation is performed on four datasets in two languages. All datasets involve binary classification problems, and performance is quantified by the area-under-the-curve (AUC), a measure of classification performance that is robust to unbalanced class distributions.
Researcher Affiliation Academia Jacob Eisenstein Georgia Institute of Technology
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes Source code: https://github.com/jacobeisenstein/probabilistic-lexicon-classification
Open Datasets Yes Amazon English-language product reviews across four domains; of these reviews, 8000 are labeled and another 19677 are unlabeled (Blitzer, Dredze, and Pereira 2007). Cornell 2000 English-language film reviews (version 2.0), labeled as positive or negative (Pang and Lee 2004). Corpus Cine 3800 Spanish-language movie reviews, rated on a scale of one to five (Vilares, Alonson, and G omez Rodr ıguez 2015). IMDB 50,000 English-language film reviews (Maas et al. 2011).
Dataset Splits Yes This classifier is trained using fivefold cross validation.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, specific libraries or frameworks with versions) were mentioned in the paper.
Experiment Setup Yes For the PROBLEX-MULT and PROBLEX-DCM methods, lexicon words which co-occur with the opposite lexicon at greater than chance frequency are eliminated from the lexicon in a preprocessing step. The penalty parameter ρ is initialized at 1, and then dynamically updated based on the primal and dual residuals.