Can a Fruit Fly Learn Word Embeddings?

Authors: Yuchen Liang, Chaitanya Ryali, Benjamin Hoover, Leopold Grinberg, Saket Navlakha, Mohammed J Zaki, Dmitry Krotov

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint). (Abstract) and Our aim here is to demonstrate that the sparse embeddings obtained by the fruit fly network motif are competitive with existing state-of-the-art word embeddings such as Glo Ve [34] and word2vec [30] and commonly used binarization tools for these continuous embeddings. We show this by evaluating the semantic similarity of static word embeddings. Several common benchmark datasets are used: WS353 [13], MEN [4], RW [28], Sim Lex [21], RG-65 [38], Mturk [18]. These datasets contain pairs of words with human-annotated similarity scores between them. Following previous work [43; 42], model similarity score for binary representations is evaluated as sim(v1, v2) = (n11 + n00)/n, where n11 (n00) is the number of bits in v1 and v2 that are both 1 (0), and n is the length of v1,2. Cosine similarity is used for real-valued representations. Spearman s correlation coefficient is calculated between this similarity and the human annotated score. The results are reported in Table 1. (Section 3.1)
Researcher Affiliation Collaboration Yuchen Liang RPI MIT-IBM Watson AI Lab liangy7@rpi.edu Chaitanya K. Ryali Department of CS UC San Diego rckrishn@eng.ucsd.edu Benjamin Hoover MIT-IBM Watson AI Lab IBM Research benjamin.hoover@ibm.com Leopold Grinberg IBM Research lgrinbe@ibm.com Saket Navlakha Cold Spring Harbor Laboratory navlakha@cshl.edu Mohammed J. Zaki Department of CS RPI zaki@cs.rpi.edu Dmitry Krotov MIT-IBM Watson AI Lab IBM Research krotov@ibm.com
Pseudocode No The paper describes the learning algorithm mathematically and in prose, but does not include a structured pseudocode or algorithm block.
Open Source Code No In order to evaluate the quality of contextualized embeddings we have created an online tool, which we are planning to release with the paper, that allows users to explore the representations learned by our model for various inputs (context-target pairs).
Open Datasets Yes The KC network shown in Fig. 1 was trained on the Open Web Text Corpus [15], which is a 32GB corpus of unstructured text containing approximately 6B tokens. and reference [15]: Aaron Gokaslan and Vanya Cohen. Open Web Text Corpus. http://Skylion007. github.io/Open Web Text Corpus, 2019.
Dataset Splits Yes For both benchmarks we report the results from a 5-fold crossvalidation study, where each fold (in turn) is used as a development set, and the remaining four folds as the test set. (Section 3.3)
Hardware Specification Yes Table 6: Training time (per epoch) and memory footprint of our method on GPUs and CPUs. For the GPU implementation, three V100 GPUs interconnected with 100GB/s (bidirectional) NVLink were used. For the CPU implementation, the computation was done on two 22-core CPUs. CPU memory is 137GB. The results are reported for window w = 11.
Software Dependencies No Our algorithm is implemented in CUDA as a back-end, while python is used as an interface with the main functions. (Section 9). This is the closest, but it doesn't give specific version numbers for CUDA or Python, nor any other libraries.
Experiment Setup Yes Hyperparameter settings for our model: K = 400, w = 11. Results for our algorithm are reported only for a fixed hash length, k = 51. (Table 1 caption) and The optimal ranges of the hyperparameters are: learning rate is ε0 10 4 5 10 4; K 200 600; w 9 15; minibatch size 2000 15000; hash length k is reported for each individual experiment. (Section 7)