Causally Denoise Word Embeddings Using Half-Sibling Regression

Authors: Zekun Yang, Tianlin Liu9426-9433

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on a battery of standard lexical-level evaluation tasks and downstream sentiment analysis tasks, our method reaches state-of-the-art performance.
Researcher Affiliation Academia Department of Information Systems College of Business City University of Hong Kong Hong Kong SAR, China zekunyang3-c@my.cityu.edu.hk Friedrich Miescher Institute for Biomedical Research Maulbeerstrasse 66 4058 Basel, Switzerland tianlin.liu@fmi.ch
Pseudocode Yes Algorithm 1: HSR algorithm for word vector postprocessing
Open Source Code Yes Our codes are available at https://github.com/KunkunYang/denoiseHSR-AAAI
Open Datasets Yes We test it on three different pre-trained English word embeddings including Word2Vec (Mikolov et al. 2013), GloVe (Pennington, Socher, and Manning 2014), and Paragram (Wieting et al. 2015). The dataset we adopt include Amazon reviews6 (AR), customer reviews (CR) (Hu and Liu 2004), IMDB movie reviews (IMDB) (Maas et al. 2011), and SST binary sentiment classification (SST-B) (Socher et al. 2013)
Dataset Splits Yes We report the five-fold cross-validation accuracy of the sentiment classification results in Table 3.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts used for running experiments.
Software Dependencies No The paper mentions using the Natural Language Toolkit (NLTK) package and a logistic regression model, but does not specify version numbers for any software dependencies.
Experiment Setup Yes For HSR, we fix the regularization constants α1, α2 = 50 for HSR-RR.