Representing Unordered Data Using Complex-Weighted Multiset Automata

Authors: Justin DeBenedetto, David Chiang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We carried out some experiments to test this hypothesis, using an open-source implementation of the Transformer, Witwicky.1 The settings used were the default settings, except that we used 8k joint BPE operations and d = 512 embedding dimensions. We tested the following variations on position encodings.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA. Correspondence to: Justin De Benedetto <jdebened@nd.edu>, David Chiang <dchiang@nd.edu>.
Pseudocode No The paper defines concepts and provides mathematical examples but does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available online.2 2https://github.com/jdebened/Complex Deep Sets
Open Datasets No The training set consisted of 100k randomly generated sequences of digits 1 9 with lengths from 1 to 50.
Dataset Splits Yes The training set consisted of 100k randomly generated sequences of digits 1 9 with lengths from 1 to 50. They were fed to each network in the order in which they were generated (which only affects GRU and LSTM). This was then split into training and dev with approximately a 99/1 split. The test set consisted of randomly generated sequences of lengths that were multiples of 5 from 5 to 95.
Hardware Specification No No specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments were provided in the paper.
Software Dependencies No The paper mentions using 'open-source implementation of the Transformer, Witwicky' and refers to 'Deep Sets' model code, but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes For tasks 1 and 2, we used mean squared error loss, a learning rate decay of 0.5 after the validation loss does not decrease for 2 epochs, and early stopping after the validation loss does not decrease for 10 epochs. each input is fed into three separate embedding layers of size 50 (for r, a, and b).