Byte-Level Machine Reading Across Morphologically Varied Languages

Authors: Tom Kenter, Llion Jones, Daniel Hewlett

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we investigate whether bytes are suitable as input units across morphologically varied languages. To test this, we introduce two large-scale machine reading datasets in morphologically rich languages, Turkish and Russian. We implement 4 byte-level models, representing the major types of machine reading models and introduce a new seq2seq variant, called encoder-transformer-decoder. We show that, for all languages considered, there are models reading bytes outperforming the current state-of-the-art word-level baseline.
Researcher Affiliation Collaboration University of Amsterdam Amsterdam The Netherlands Llion Jones Google Research Mountain View United States Daniel Hewlett Google Mountain View United States
Pseudocode No The paper describes the models and their components textually and with diagrams, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The text mentions that 'The large-scale Turkish and Russian machine reading datasets are released to public.' and provides a link: 'The two additional datasets are publicly available at http://goo.gl/wikireading.' However, it does not state that the source code for the models or methodology described in the paper is released.
Open Datasets Yes The large-scale Turkish and Russian machine reading datasets are released to public. The two additional datasets are publicly available at http://goo.gl/wikireading.
Dataset Splits Yes The already existing English dataset is split in training/validation/test according to a 85/10/5 distribution. For the new sets, which are smaller, we choose a 80/10/10 split, to keep enough examples in the test set.
Hardware Specification No The paper discusses memory usage limitations ('memory usage becomes an issue when we unroll the RNNs longer for') but does not provide any specific details about the hardware (e.g., GPU/CPU models, RAM, or cloud instance types) used to run the experiments.
Software Dependencies No The paper mentions machine learning components like LSTM, GRU cells, and the Adam optimizer, but it does not specify any software dependencies (e.g., programming languages, libraries, or frameworks) along with their version numbers.
Experiment Setup Yes Table 2 lists the values of hyperparameters tuned over on the validation data. For the multi-level, bidirectional and convolutional-recurrent encoder, the document is appended to the query with a separator symbol in between. All embeddings are trained from scratch (i.e., no pre-trained vectors are used). We experiment with either sharing input and output embeddings, i.e., a single embedding matrix is employed, or having two separate embedding matrices (Press and Wolf 2016). For the memory network and the encoder-transformer-decoder model, the intermediate RNN performs 2 recurrent steps, as this yielded consistent performance in preliminary experiments. At most 50 bytes are read of each question, which is sufficient because questions are rarely longer than this. At most 400 bytes are read from the documents. This limit is imposed by resources memory usage becomes an issue when we unroll the RNNs longer for. Table 3 lists the additional hyperparameters tuned over for the convolutionalrecurrent model. All models are trained with stochastic gradient descent. The learning rate is adapted per parameter with Adam (Kingma and Ba 2015). Batch size is 64. After every 50,000 batches, the learning rate is divided by 2.