UGSD: User Generated Sentiment Dictionaries from Online Customer Reviews

Authors: Chun-Hsiang Wang, Kang-Chun Fan, Chuan-Ju Wang, Ming-Feng Tsai313-320

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of the proposed framework, we conduct extensive experiments on three real-world datasets: Yelp restaurant reviews, Trip Advisor attraction reviews, and Amazon product reviews. Three types of experiments are conducted: 1) We compare the generated Yelp dictionaries with the state-of-the-art Stanford Yelp dictionary (Reschke, Vogel, and Jurafsky 2013); we conduct both 2) the traditional sentiment classification and 3) the entity ranking (Chao et al. 2017) to evaluate the effectiveness of the generated dictionaries. The experimental results show that the framework is effective in constructing high-quality, domain-specific sentiment dictionaries from online reviews.
Researcher Affiliation Academia Chun-Hsiang Wang,1 Kang-Chun Fan,2 Chuan-Ju Wang,2 Ming-Feng Tsai1,3 1Department of Computer Science, National Chengchi University, Taiwan 2Research Center for Information Technology Innovation, Academia Sinica, Taiwan 3MOST Joint Research Center for AI Technology and All Vista Healthcare, Taiwan {ch_wang, mftsai}@nccu.edu.tw, {kcfan, cjwang}@citi.sinica.edu.tw
Pseudocode No The paper describes the framework's steps (candidate sentiment word selection, review transformation, word representation learning, lexicon construction) but does not present them in pseudocode or algorithm blocks.
Open Source Code Yes The three collected datasets and the source codes are available at https://github.com/cnclabs/UGSD.
Open Datasets Yes Yelp Restaurant Reviews The customer reviews of the Yelp dataset were collected from the 9th round of the Yelp Dataset Challenge,1 from which we extracted the reviews of 215 restaurants located in Las Vegas, as the Vegas area has the most reviews as compared to other areas in the challenge dataset. (1https://www.yelp.com/dataset/challenge) [...] Amazon Product Reviews This dataset provided by (Wang, Lu, and Zhai 2010) consists of reviews on six categories of electronic supplies: cameras, televisions, laptops, mobile phones, tablets, and video surveillance equipment.
Dataset Splits No The paper does not explicitly state the training/validation/test splits with percentages or counts for reproducing the experiments.
Hardware Specification No No specific hardware (GPU/CPU models, memory, etc.) used for running the experiments is mentioned in the paper.
Software Dependencies No The paper mentions using 'Core NLP', 'Snowball stemmer', and 'NLTK' but does not specify their version numbers.
Experiment Setup Yes To learn embeddings to presume the cooccurrence proximity, the number of negative samples is set to 5, the representation dimension is set to 200, and the total number of samples is set to 25 million.