reproducibilityindex.ai

FRAGE: Frequency-Agnostic Word Representation

Authors: Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted comprehensive studies on ten datasets across four natural language processing tasks, including word similarity, language modeling, machine translation, and text classiﬁcation. Results show that with FRAGE, we achieve higher performance than the baselines in all tasks.
Researcher Affiliation	Collaboration	1Peking University 2Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 3Microsoft Research Asia 4Center for Data Science, Peking University, Beijing Institute of Big Data Research
Pseudocode	Yes	Algorithm 1 Proposed Algorithm
Open Source Code	Yes	Code for our implementation is available at https://github.com/Chengyue Gong R/Frequency Agnostic
Open Datasets	Yes	We use the skip-gram model as our baseline model [28]5, and train the embeddings using Enwik96. 6http://mattmahoney.net/dc/textdata.html ... We do experiments on two widely used datasets [25, 26, 41], Penn Treebank (PTB) [27] and Wiki Text-2 (WT2) [26].
Dataset Splits	Yes	Table 2: Perplexity on validation and test sets on Penn Treebank and Wiki Text2. ... For fair comparisons, for each task, our method shares the same model architecture as the baseline. The only difference is that we use the original task-speciﬁc loss function with an additional adversarial loss as in Eqn. (3). Dataset description and hyper-parameter conﬁgurations can be found in [12].
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies	No	The paper mentions software like 'word2vec', 'Transformer', 'AWD-LSTM', and 'AWD-LSTM-Mo S' but does not provide specific version numbers for these or other ancillary software components like programming languages or libraries.
Experiment Setup	Yes	In all tasks, we simply set the top 20% frequent words in vocabulary as popular words and denote the rest as rare words... For all the tasks except training skip-gram model, we use full-batch gradient descent to update the discriminator. For training skip-gram model, mini-batch stochastic gradient descent is used to update the discriminator with a batch size 3000... For language modeling and machine translation tasks, we use logistic regression as the discriminator. For other tasks, we ﬁnd using a shallow neural network with one hidden layer is more efﬁcient and we set the number of nodes in the hidden layer as 1.5 times embedding size. In all tasks, we set the hyper-parameter λ to 0.1.