Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations.

Authors: Sawyer Birnbaum, Volodymyr Kuleshov, Zayd Enam, Pang Wei W. Koh, Stefano Ermon

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that TFi LM significantly improves the learning speed and accuracy of feed-forward neural networks on a range of generative and discriminative learning tasks, including text classification and audio super-resolution. 5 Experiments In this section, we show that TFi LM layers improve the performance of convolutional models on both discriminative and generative tasks.
Researcher Affiliation Collaboration 1Stanford University, Stanford, CA 2Afresh Technologies, San Francisco, CA
Pseudocode Yes Algorithm 1 Temporal Feature-Wise Linear Modulation. Input: Tensor of 1D convolutional activations F RT C where T, C are, respectively, the temporal dimension and the number of channels, and a block length B. Output: Adaptively normalized tensor of activations F RT C.
Open Source Code No The paper does not provide a specific repository link or an explicit statement about the release of the source code for the methodology described.
Open Datasets Yes Datasets. We use the Yelp-2 and Yelp-5 datasets [1], which are standard datasets for sentiment analysis. We use the VCTK dataset [53] which contains 44 hours of data from 108 speakers and a Piano dataset 10 hours of Beethoven sonatas [40]. We use histone Ch IP-seq data from lymphoblastoid cell lines derived from several individuals of diverse ancestry [29]
Dataset Splits Yes In the MULTISPEAKER task, we train on the first 99 VCTK speakers and test on the 8 remaining ones. Lastly, the PIANO task extends audio super resolution to non-vocal data; we use the standard 88%-6%-6% data split.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions "Keras s built-in tokenizer" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We train for 20 epochs using the ADAM optimizer with a learning rate of 10 3 and a batch size of 128. We instantiate our model with K = 4 blocks and train it for 50 epochs on patches of length 8192 (in the high-resolution space) using the ADAM optimizer with a learning rate of 3 10 4.