Hash Embeddings for Efficient Word Representations

Authors: Dan Tito Svenstrup, Jonas Hansen, Ole Winther

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark hash embeddings with and without dictionaries on text classification tasks. We evaluate hash embeddings on 7 different datasets... The performance of the model when using each of the two embedding types can be seen in the left side of table 2. Our experiments show that the performance of hash embeddings is always at par with using standard embeddings, and in most cases better.
Researcher Affiliation Collaboration Dan Svenstrup Department for Applied Mathematics and Computer Science Technical University of Denmark (DTU) 2800 Lyngby, Denmark dsve@dtu.dk Jonas Meinertz Hansen Find Zebra Copenhagen, Denmark jonas@findzebra.com Ole Winther Department for Applied Mathematics and Computer Science Technical University of Denmark (DTU) 2800 Lyngby, Denmark olwi@dtu.dk
Pseudocode No The paper describes the steps of hash embedding construction in text and illustrates it with a diagram (Fig. 1), but it does not contain a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes We evaluate hash embeddings on 7 different datasets in the form introduced by Zhang et al. (2015) for various text classification tasks including topic classification, sentiment analysis, and news categorization. An overview of the datasets can be seen in table 1.
Dataset Splits Yes We use early stopping with a patience of 10, and use 5% of the training data as validation data.
Hardware Specification Yes The training was performed on a Nvidia Ge Force GTX TITAN X with 12 GB of memory. the small performance difference was observed when using Keras with a Tensorflow backend on a Ge Force GTX TITAN X with 12 GB of memory and a Nvidia Ge Force GTX 660 with 2GB memory.
Software Dependencies No The paper states 'All models were implemented using Keras with Tensor Flow backend' but does not provide specific version numbers for Keras, TensorFlow, or any other ancillary software components.
Experiment Setup Yes All the models are trained by minimizing the cross entropy using the stochastic gradient descentbased Adam method (Kingma and Ba, 2014) with a learning rate set to α = 0.001. We use early stopping with a patience of 10, and use 5% of the training data as validation data. The hash embeddings use K = 10M different importance parameter vectors, k = 2 hash functions, and B = 1M component vectors of dimension d = 20.