reproducibilityindex.ai

Causally Denoise Word Embeddings Using Half-Sibling Regression

Authors: Zekun Yang, Tianlin Liu9426-9433

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated on a battery of standard lexical-level evaluation tasks and downstream sentiment analysis tasks, our method reaches state-of-the-art performance.
Researcher Affiliation	Academia	Department of Information Systems College of Business City University of Hong Kong Hong Kong SAR, China zekunyang3-c@my.cityu.edu.hk Friedrich Miescher Institute for Biomedical Research Maulbeerstrasse 66 4058 Basel, Switzerland tianlin.liu@fmi.ch
Pseudocode	Yes	Algorithm 1: HSR algorithm for word vector postprocessing
Open Source Code	Yes	Our codes are available at https://github.com/KunkunYang/denoiseHSR-AAAI
Open Datasets	Yes	We test it on three different pre-trained English word embeddings including Word2Vec (Mikolov et al. 2013), GloVe (Pennington, Socher, and Manning 2014), and Paragram (Wieting et al. 2015). The dataset we adopt include Amazon reviews6 (AR), customer reviews (CR) (Hu and Liu 2004), IMDB movie reviews (IMDB) (Maas et al. 2011), and SST binary sentiment classiﬁcation (SST-B) (Socher et al. 2013)
Dataset Splits	Yes	We report the ﬁve-fold cross-validation accuracy of the sentiment classiﬁcation results in Table 3.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions using the Natural Language Toolkit (NLTK) package and a logistic regression model, but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	For HSR, we ﬁx the regularization constants α1, α2 = 50 for HSR-RR.