reproducibilityindex.ai

Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces

Authors: Victor Prokhorov, Mohammad Taher Pilehvar, Dimitri Kartsaklis, Pietro Lio, Nigel Collier6900-6907

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and in-vivo, in two downstream text classiﬁcation tasks.
Researcher Affiliation	Collaboration	1Department of Theoretical and Applied Linguistics, University of Cambridge 2School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran 3Apple, Cambridge, UK 4Department of Computer Science, University of Cambridge
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	The code used in our experiments will be released to allow future experimentation and comparison.5 https://github.com/Victor Prokhorov/AAAI2019
Open Datasets	Yes	In our experiments, we used Word Net 3.0 (Fellbaum 1998) as external knowledge base. We experimented with two sets of word2vec (Mikolov et al. 2013) embeddings trained on two different corpora: (1) W2V-GN, the Google News... and (2) W2V-WP, the Wikipedia corpus (Shaoul and Westbury 2010)... The Stanford Rare Word (RW) Similarity dataset (Luong, Socher, and Manning 2013)... RG-65 (Rubenstein and Goodenough 1965), Sim Lex-999 (Hill, Reichart, and Korhonen 2015), MEN (Bruni, Tran, and Baroni 2014), Word Sim-353 similarity subset (Agirre et al. 2009), and Sim Verb-3500 (Gerz et al. 2016)... PL04 (Pang and Lee 2004), PL05 (Pang and Lee 2005)... IMDB (Maas et al. 2011)... Stanford Sentiment dataset (Socher et al. 2013)... The BBC news dataset CR 10 (Greene and Cunningham 2006) and Newsgroups (Lang 1995)... Ohsumed11...
Dataset Splits	No	The paper describes using standard datasets for evaluation, but does not explicitly provide specific training, validation, and test split percentages or counts for these datasets, nor details on how data was partitioned for model training.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions software like word2vec, node2vec, CNN, and LSTM, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	In our experiments, we set the parameters of node2vec as follows: walk length to 100, window size to 10, and embed-ding dimensionality to 100. The dimensionality of the resultant space in our experiments is min(d C, d K) = d K = 100. In all settings the embedding layer was not updated during training (static). In each conﬁguration we repeat the experiment three times and report the average performance. In our experiments, we used a CNN text classiﬁer which is similar to that of Kim (2014)... as our recurrent layer we used LSTM (Hochreiter and Schmidhuber 1997).