Identifying Sentiment Words Using an Optimization Model with L1 Regularization
Authors: Zhi-Hong Deng, Hongliang Yu, Yunlun Yang
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on the real datasets show that ISOMER outperforms the classic approaches, and that the lexicon learned by ISOMER can be effectively adapted to document-level sentiment analysis. |
| Researcher Affiliation | Academia | Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA |
| Pseudocode | No | The paper describes the solution process by following a sub-gradient method and outlining iteration steps, but it does not present a formally structured pseudocode block or algorithm figure. |
| Open Source Code | No | The paper does not provide any specific links to open-source code or explicit statements about its public release. |
| Open Datasets | Yes | The Cornell Movie Review Data 1, first used in (Pang, Lee, and Vaithyanathan 2002), is a widely used benchmark. This corpus contains 1,000 positive and 1,000 negative processed reviews of movies, extracted from the Internet Movie Database. The other corpus is the Stanford Large Movie Review Dataset2 (Maas et al. 2011). (Maas et al. 2011) constructed a collection of 50,000 reviews from IMDB, half of which are positive reviews and half negative. We use MPQA subjective lexicon 3 to generate the gold standard. |
| Dataset Splits | No | The paper mentions '10-fold cross-validation' for the document-level sentiment classification task, but it does not provide specific train/validation/test splits for the main sentiment word identification problem that ISOMER addresses. It mentions randomly selecting seed words and candidate words, but not explicit dataset splits for model training and validation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions using 'the word segmentation tool ICTCLAS 5' and 'SVM classifier 6' but does not specify version numbers for these software dependencies. It only provides a general reference for the SVM classifier (a URL to libsvm). |
| Experiment Setup | Yes | We adopt the above settings in our experiments. In our model, the tuning parameter β determines the proportion of selected sentiment words in the candidate set, called density . As β increases, the regularizer tends to select fewer and more significant words. For convenience of comparing with other methods, we choose β for each dataset which enables the density to approximately equal to the real value, i.e. β = 2 10 4 for Stanford and Chinese datasets and β = 9 10 4 for Cornell dataset. ... TF-IDF is used as the word weighting scheme to compute fij in our model. |