Definition Modeling: Learning to Define Word Embeddings in Natural Language

Authors: Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present several definition model architectures based on recurrent neural networks, and experiment with the models over multiple data sets. Our results show that a model that controls dependencies between the word being defined and the definition words performs significantly better, and that a characterlevel convolution layer designed to leverage morphology can complement word-level embeddings.
Researcher Affiliation Academia Department of Electrical Engineering & Computer Science Northwestern University, Evanston IL 60208, USA {nor, chenliang2013}@u.northwestern.edu, {l-birnbaum,d-downey}@northwestern.edu
Pseudocode No The paper describes the model architectures and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The source code and dataset for our experiment can be found at https://github.com/websail-nu/torch-defseq.
Open Datasets Yes To create a corpus of reasonable size for machine learning experiments, we sample around 20k words from the 50k most frequent words in the Google Web 1T corpus (Brants and Franz 2006), removing function words.
Dataset Splits Yes Table 2: Basic statistics of the common word definitions corpus. Splits are mutually exclusive in the words being defined. Split train valid test #Words 20,298 1,127 1,129 #Entries 146,486 8,087 8,352 #Tokens 1,406,440 77,948 79,699 Avg length 6.60 6.64 6.54
Hardware Specification No No specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) were mentioned. The paper refers to tools like Word2Vec and Adam without specifying their versions.
Experiment Setup Yes The embedding and LSTM hidden layers have 300 units each. For the affix detector, the characterlevel CNN has kernels of length 2-6 and size {10, 30, 40, 40, 40} with a stride of 1. During training, we maximize the log-likelihood objective using Adam, a variation of stochastic gradient decent (Kingma and Ba 2014). The learning rate is 0.001, and the training stops after 4 consecutive epochs of no significant improvement in the validation performance.