Identifying Sentiment Words Using an Optimization Model with L1 Regularization

Authors: Zhi-Hong Deng, Hongliang Yu, Yunlun Yang

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments on the real datasets show that ISOMER outperforms the classic approaches, and that the lexicon learned by ISOMER can be effectively adapted to document-level sentiment analysis.
Researcher Affiliation Academia Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
Pseudocode No The paper describes the solution process by following a sub-gradient method and outlining iteration steps, but it does not present a formally structured pseudocode block or algorithm figure.
Open Source Code No The paper does not provide any specific links to open-source code or explicit statements about its public release.
Open Datasets Yes The Cornell Movie Review Data 1, first used in (Pang, Lee, and Vaithyanathan 2002), is a widely used benchmark. This corpus contains 1,000 positive and 1,000 negative processed reviews of movies, extracted from the Internet Movie Database. The other corpus is the Stanford Large Movie Review Dataset2 (Maas et al. 2011). (Maas et al. 2011) constructed a collection of 50,000 reviews from IMDB, half of which are positive reviews and half negative. We use MPQA subjective lexicon 3 to generate the gold standard.
Dataset Splits No The paper mentions '10-fold cross-validation' for the document-level sentiment classification task, but it does not provide specific train/validation/test splits for the main sentiment word identification problem that ISOMER addresses. It mentions randomly selecting seed words and candidate words, but not explicit dataset splits for model training and validation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies No The paper mentions using 'the word segmentation tool ICTCLAS 5' and 'SVM classifier 6' but does not specify version numbers for these software dependencies. It only provides a general reference for the SVM classifier (a URL to libsvm).
Experiment Setup Yes We adopt the above settings in our experiments. In our model, the tuning parameter β determines the proportion of selected sentiment words in the candidate set, called density . As β increases, the regularizer tends to select fewer and more significant words. For convenience of comparing with other methods, we choose β for each dataset which enables the density to approximately equal to the real value, i.e. β = 2 10 4 for Stanford and Chinese datasets and β = 9 10 4 for Cornell dataset. ... TF-IDF is used as the word weighting scheme to compute fij in our model.