Encoding word order in complex embeddings

Authors: Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on text classification, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings.
Researcher Affiliation Academia Benyou Wang University of Padua wang@dei.unipd.it Donghao Zhao Tianjin University zhaodh@tju.edu.cn Christina Lioma University of Copenhagen chrh@di.ku.dk Qiuchi Li University of Padua qiuchili@dei.unipd.it Peng Zhang Tianjin University pzhang@tju.edu.cn Jakob Grue Simonsen University of Copenhagen simonsen@di.ku.dk
Pseudocode Yes We list the basic code to construct our general embedding as below: import torch import math class ComplexNN (torch.nn.Module): def init (self, opt): super(ComplexNN, self).init() self.word_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.frequency_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.initial_phase_emb = torch.nn.Embedding(opt.n_token, opt.d_model)
Open Source Code Yes 1The code is on https://github.com/iclr-complex-order/complex-order
Open Datasets Yes We use six popular text classification datasets: CR, MPQA, SUBJ, MR, SST, and TREC (see Tab. 1)... We use the standard WMT 2016 English-German dataset (Sennrich et al., 2016)... We use the text8 (Mahoney, 2011) dataset
Dataset Splits Yes CV means 10-fold cross validation. The last 2 datasets come with train/dev/test splits.
Hardware Specification Yes Figure 2: Computation time (seconds) per epoch in Tensorflow on TITAN X GPU.
Software Dependencies No The paper mentions 'TensorFlow' and 'torch' (PyTorch) but does not specify version numbers for any software or libraries.
Experiment Setup Yes We search the hyper parameters from a parameter pool, with batch size in {32, 64, 128}, learning rate in {0.001, 0.0001, 0.00001}, L2-regularization rate in {0, 0.001, 0.0001}, and number of hidden layer units in {120, 128}.