reproducibilityindex.ai

SPINE: SParse Interpretable Neural Embeddings

Authors: Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Eduard Hovy

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through large scale human evaluation, we report that our resulting word embedddings are much more interpretable than the original Glo Ve and word2vec embeddings. Moreover, our embeddings outperform existing popular word embeddings on a diverse suite of benchmark downstream tasks.
Researcher Affiliation	Academia	Anant Subramanian,* Danish Pruthi,* Harsh Jhamtani,* Taylor Berg-Kirkpatrick, Eduard Hovy School of Computer Science Carnegie Mellon University, Pittsburgh, USA {anant,danish,jharsh,tberg,hovy}@cmu.edu
Pseudocode	No	The paper provides mathematical formulations and descriptions of the model but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and generated word vectors are publicly available at https://github.com/harsh19/SPINE
Open Datasets	Yes	We train autoencoder models on pre-trained Glo Ve and word2vec embeddings. The Glo Ve vectors were trained on 6 billion tokens from a 2014 dump of Wikipedia and Gigaword5, while the word2vec vectors were trained on around 100 billion words from a part of the Google News dataset. ... Sentiment Analysis: This task tests the semantic properties of word embeddings. It is a sentence-level binary classiﬁcation task on the Stanford Sentiment Treebank dataset (Socher et al. 2013). ... Question Classiﬁcation (TREC): To facilitate research in question answering, (Li and Roth 2006) propose a dataset of categorizing questions into six different types, e.g., whether the question is about a location, about a person, or about some numeric information. The TREC dataset comprises of 5,452 labeled training questions, and 500 test questions.
Dataset Splits	Yes	We use 15k of these words for training, and use the remaining 2k for hyperparameter tuning. ... Sentiment Analysis: ... We used the provided train, dev. and test splits with only the non-neutral labels, of sizes 8337, 1081 and 2166 sentences respectively. ... Question Classiﬁcation (TREC): ... By isolating 10% of the training questions for validation, we use train/validation/test splits of 4906/546/500 questions respectively.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for experiments.
Software Dependencies	No	The paper mentions using SVMs, Logistic Regression, and Random forests, but does not specify version numbers for any software or libraries used in the experiments.
Experiment Setup	Yes	Table 3: Grid-search was performed to select values of the following hyperparamters: Sparsity fraction (ρ ), hiddendimension size (\|H\|), standard deviation of the additive isotropic zero-mean Gaussian noise (σ), and the coefﬁcients for the ASL and PSL loss terms (λ1 and λ2). ... We observed that a hidden layer of size 1000 units is optimal for our case.