Encoding word order in complex embeddings
Authors: Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on text classification, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings. |
| Researcher Affiliation | Academia | Benyou Wang University of Padua wang@dei.unipd.it Donghao Zhao Tianjin University zhaodh@tju.edu.cn Christina Lioma University of Copenhagen chrh@di.ku.dk Qiuchi Li University of Padua qiuchili@dei.unipd.it Peng Zhang Tianjin University pzhang@tju.edu.cn Jakob Grue Simonsen University of Copenhagen simonsen@di.ku.dk |
| Pseudocode | Yes | We list the basic code to construct our general embedding as below: import torch import math class ComplexNN (torch.nn.Module): def init (self, opt): super(ComplexNN, self).init() self.word_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.frequency_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.initial_phase_emb = torch.nn.Embedding(opt.n_token, opt.d_model) |
| Open Source Code | Yes | 1The code is on https://github.com/iclr-complex-order/complex-order |
| Open Datasets | Yes | We use six popular text classification datasets: CR, MPQA, SUBJ, MR, SST, and TREC (see Tab. 1)... We use the standard WMT 2016 English-German dataset (Sennrich et al., 2016)... We use the text8 (Mahoney, 2011) dataset |
| Dataset Splits | Yes | CV means 10-fold cross validation. The last 2 datasets come with train/dev/test splits. |
| Hardware Specification | Yes | Figure 2: Computation time (seconds) per epoch in Tensorflow on TITAN X GPU. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'torch' (PyTorch) but does not specify version numbers for any software or libraries. |
| Experiment Setup | Yes | We search the hyper parameters from a parameter pool, with batch size in {32, 64, 128}, learning rate in {0.001, 0.0001, 0.00001}, L2-regularization rate in {0, 0.001, 0.0001}, and number of hidden layer units in {120, 128}. |