reproducibilityindex.ai

Unsupervised Learning for Lexicon-Based Classification

Authors: Jacob Eisenstein

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An empirical evaluation is performed on four datasets in two languages. All datasets involve binary classiﬁcation problems, and performance is quantiﬁed by the area-under-the-curve (AUC), a measure of classiﬁcation performance that is robust to unbalanced class distributions.
Researcher Affiliation	Academia	Jacob Eisenstein Georgia Institute of Technology
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	Yes	Source code: https://github.com/jacobeisenstein/probabilistic-lexicon-classiﬁcation
Open Datasets	Yes	Amazon English-language product reviews across four domains; of these reviews, 8000 are labeled and another 19677 are unlabeled (Blitzer, Dredze, and Pereira 2007). Cornell 2000 English-language ﬁlm reviews (version 2.0), labeled as positive or negative (Pang and Lee 2004). Corpus Cine 3800 Spanish-language movie reviews, rated on a scale of one to ﬁve (Vilares, Alonson, and G omez Rodr ıguez 2015). IMDB 50,000 English-language ﬁlm reviews (Maas et al. 2011).
Dataset Splits	Yes	This classiﬁer is trained using ﬁvefold cross validation.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, specific libraries or frameworks with versions) were mentioned in the paper.
Experiment Setup	Yes	For the PROBLEX-MULT and PROBLEX-DCM methods, lexicon words which co-occur with the opposite lexicon at greater than chance frequency are eliminated from the lexicon in a preprocessing step. The penalty parameter ρ is initialized at 1, and then dynamically updated based on the primal and dual residuals.