reproducibilityindex.ai

Definition Modeling: Learning to Define Word Embeddings in Natural Language

Authors: Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present several deﬁnition model architectures based on recurrent neural networks, and experiment with the models over multiple data sets. Our results show that a model that controls dependencies between the word being deﬁned and the deﬁnition words performs signiﬁcantly better, and that a characterlevel convolution layer designed to leverage morphology can complement word-level embeddings.
Researcher Affiliation	Academia	Department of Electrical Engineering & Computer Science Northwestern University, Evanston IL 60208, USA {nor, chenliang2013}@u.northwestern.edu, {l-birnbaum,d-downey}@northwestern.edu
Pseudocode	No	The paper describes the model architectures and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The source code and dataset for our experiment can be found at https://github.com/websail-nu/torch-defseq.
Open Datasets	Yes	To create a corpus of reasonable size for machine learning experiments, we sample around 20k words from the 50k most frequent words in the Google Web 1T corpus (Brants and Franz 2006), removing function words.
Dataset Splits	Yes	Table 2: Basic statistics of the common word deﬁnitions corpus. Splits are mutually exclusive in the words being deﬁned. Split train valid test #Words 20,298 1,127 1,129 #Entries 146,486 8,087 8,352 #Tokens 1,406,440 77,948 79,699 Avg length 6.60 6.64 6.54
Hardware Specification	No	No specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) were mentioned. The paper refers to tools like Word2Vec and Adam without specifying their versions.
Experiment Setup	Yes	The embedding and LSTM hidden layers have 300 units each. For the afﬁx detector, the characterlevel CNN has kernels of length 2-6 and size {10, 30, 40, 40, 40} with a stride of 1. During training, we maximize the log-likelihood objective using Adam, a variation of stochastic gradient decent (Kingma and Ba 2014). The learning rate is 0.001, and the training stops after 4 consecutive epochs of no signiﬁcant improvement in the validation performance.