reproducibilityindex.ai

Encoding word order in complex embeddings

Authors: Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on text classiﬁcation, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings.
Researcher Affiliation	Academia	Benyou Wang University of Padua wang@dei.unipd.it Donghao Zhao Tianjin University zhaodh@tju.edu.cn Christina Lioma University of Copenhagen chrh@di.ku.dk Qiuchi Li University of Padua qiuchili@dei.unipd.it Peng Zhang Tianjin University pzhang@tju.edu.cn Jakob Grue Simonsen University of Copenhagen simonsen@di.ku.dk
Pseudocode	Yes	We list the basic code to construct our general embedding as below: import torch import math class ComplexNN (torch.nn.Module): def init (self, opt): super(ComplexNN, self).init() self.word_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.frequency_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.initial_phase_emb = torch.nn.Embedding(opt.n_token, opt.d_model)
Open Source Code	Yes	1The code is on https://github.com/iclr-complex-order/complex-order
Open Datasets	Yes	We use six popular text classiﬁcation datasets: CR, MPQA, SUBJ, MR, SST, and TREC (see Tab. 1)... We use the standard WMT 2016 English-German dataset (Sennrich et al., 2016)... We use the text8 (Mahoney, 2011) dataset
Dataset Splits	Yes	CV means 10-fold cross validation. The last 2 datasets come with train/dev/test splits.
Hardware Specification	Yes	Figure 2: Computation time (seconds) per epoch in Tensorﬂow on TITAN X GPU.
Software Dependencies	No	The paper mentions 'TensorFlow' and 'torch' (PyTorch) but does not specify version numbers for any software or libraries.
Experiment Setup	Yes	We search the hyper parameters from a parameter pool, with batch size in {32, 64, 128}, learning rate in {0.001, 0.0001, 0.00001}, L2-regularization rate in {0, 0.001, 0.0001}, and number of hidden layer units in {120, 128}.