reproducibilityindex.ai

All Word Embeddings from One Embedding

Authors: Sho Takase, Sosuke Kobayashi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We indicate our ALONE can be used as word representation sufﬁciently through an experiment on the reconstruction of pre-trained word embeddings. In addition, we also conduct experiments on NLP application tasks: machine translation and summarization.
Researcher Affiliation	Collaboration	Sho Takase Tokyo Institute of Technology sho.takase@nlp.c.titech.ac.jp Sosuke Kobayashi Tohoku University Preferred Networks, Inc. sosk@preferred.jp
Pseudocode	No	The paper describes the method using mathematical equations and textual descriptions, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	1The code is publicly available at https://github.com/takase/alone_seq2seq
Open Datasets	Yes	We used the pre-trained 300 dimensional Glo Ve3 [22] as source word embeddings and reconstructed them with ALONE. ... We used WMT En-De dataset since it is widely used to evaluate the performance of machine translation [6, 36, 18]. ... We used the DUC 2004 task 1 [20] as the test set.
Dataset Splits	Yes	Following previous studies [36, 18], we used WMT 2016 training data, which contains 4.5M sentence pairs, newstest2013, newstest2014 for training, validation, and test respectively.
Hardware Specification	No	The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software like PyTorch [21] and fairseq [19], but it does not provide specific version numbers for these or other software components necessary for replication.
Experiment Setup	Yes	We set mini-batch size 256 and the number of epochs 1000. For c, M, and po in the binary mask, we set 64, 8, and 0.5 respectively. We used the same dimension size as Glo Ve (300) for Do and conducted experiments with varying Dinter in {600, 1200, 1800, 2400}. ... We set Do the same number as the dimension of each layer in the Transformer (dmodel, i.e., 512) and varied Dinter. For other hyper-parameters, we set as follows: c = 64, M = 8, and po = 0.5. Moreover, we applied the dropout after the Re LU activation function in Equation (3).