Definition Modeling: Learning to Define Word Embeddings in Natural Language
Authors: Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present several definition model architectures based on recurrent neural networks, and experiment with the models over multiple data sets. Our results show that a model that controls dependencies between the word being defined and the definition words performs significantly better, and that a characterlevel convolution layer designed to leverage morphology can complement word-level embeddings. |
| Researcher Affiliation | Academia | Department of Electrical Engineering & Computer Science Northwestern University, Evanston IL 60208, USA {nor, chenliang2013}@u.northwestern.edu, {l-birnbaum,d-downey}@northwestern.edu |
| Pseudocode | No | The paper describes the model architectures and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The source code and dataset for our experiment can be found at https://github.com/websail-nu/torch-defseq. |
| Open Datasets | Yes | To create a corpus of reasonable size for machine learning experiments, we sample around 20k words from the 50k most frequent words in the Google Web 1T corpus (Brants and Franz 2006), removing function words. |
| Dataset Splits | Yes | Table 2: Basic statistics of the common word definitions corpus. Splits are mutually exclusive in the words being defined. Split train valid test #Words 20,298 1,127 1,129 #Entries 146,486 8,087 8,352 #Tokens 1,406,440 77,948 79,699 Avg length 6.60 6.64 6.54 |
| Hardware Specification | No | No specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) were mentioned. The paper refers to tools like Word2Vec and Adam without specifying their versions. |
| Experiment Setup | Yes | The embedding and LSTM hidden layers have 300 units each. For the affix detector, the characterlevel CNN has kernels of length 2-6 and size {10, 30, 40, 40, 40} with a stride of 1. During training, we maximize the log-likelihood objective using Adam, a variation of stochastic gradient decent (Kingma and Ba 2014). The learning rate is 0.001, and the training stops after 4 consecutive epochs of no significant improvement in the validation performance. |