On-the-fly Operation Batching in Dynamic Computation Graphs

Authors: Graham Neubig, Yoav Goldberg, Chris Dyer

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we describe our experiments, designed to answer three main questions: (1) in situations where manual batching is easy, how close can the proposed method approach the efficiency of a program that uses hand-crafted manual batching, and how do the depth-based and agenda-based approaches compare ( 4.1)? (2) in situations where manual batching is less easy, is the proposed method capable of obtaining significant improvements in efficiency ( 4.2)? (3) how does the proposed method compare to Tensor Flow Fold, an existing method for batching variably structured networks within a static declaration framework ( 4.3)?
Researcher Affiliation Collaboration Graham Neubig Language Technologies Institute Carnegie Mellon University gneubig@cs.cmu.edu Yoav Goldberg Computer Science Department Bar-Ilan University yogo@cs.biu.ac.il Deep Mind cdyer@google.com
Pseudocode Yes Pseudo-code for constructing the graph for each of the RNNs on the left using a dynamic declaration framework is as follows: function RNN-REGRESSION-LOSS(x1:n, y; (W, U, b, c) = ) ... and function TRAIN-BATCH-NAIVE(T = {(x(i) 1:n(i), y(i))}b i=1; ) NEW-GRAPH() ... and function RNN-REGRESSION-BATCH-LOSS(X1:nmax, Y, n(1:b); (W, U, b, c) = ) ... and function TRAIN-BATCH-MANUAL(T = {(x(i) 1:n(i), y(i))}b i=1; ) nmax = maxi n(i) ...
Open Source Code Yes The proposed algorithm is implemented in Dy Net (http://dynet.io/), and can be activated by using the --dynet-autobatch 1 command line flag.
Open Datasets Yes images of a fixed size such those in the MNIST and CIFAR datasets and we train a model on a bi-directional LSTM sequence labeler [12, 23], on synthetic data where every sequence to be labeled is the same length (40). and We compare the Tensor Flow Fold reference implementation of the Stanford Sentiment Treebank regression task [30].
Dataset Splits No The paper mentions The batch size is 64 and the use of synthetic data and actual variable length sequences, but it does not provide specific percentages or counts for training, validation, or test splits. It only mentions evaluation of the dev set in the context of a comparison with Tensor Flow Fold, not for their own experimental setup.
Hardware Specification Yes Experiments were run on a single Tesla K80 GPU or Intel Xeon 2.30GHz E5-2686v4 CPU.
Software Dependencies No The paper mentions toolkits like PyTorch, DyNet, Chainer, Tensor Flow, CNTK, and Theano, but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes The batch size is 64. and The network takes as input a size 200 embedding vector from a vocabulary of size 1000, has 2 layers of 256 hidden node LSTMs in either direction, then predicts a label from one of 300 classes.