Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations.
Authors: Sawyer Birnbaum, Volodymyr Kuleshov, Zayd Enam, Pang Wei W. Koh, Stefano Ermon
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that TFi LM significantly improves the learning speed and accuracy of feed-forward neural networks on a range of generative and discriminative learning tasks, including text classification and audio super-resolution. 5 Experiments In this section, we show that TFi LM layers improve the performance of convolutional models on both discriminative and generative tasks. |
| Researcher Affiliation | Collaboration | 1Stanford University, Stanford, CA 2Afresh Technologies, San Francisco, CA |
| Pseudocode | Yes | Algorithm 1 Temporal Feature-Wise Linear Modulation. Input: Tensor of 1D convolutional activations F RT C where T, C are, respectively, the temporal dimension and the number of channels, and a block length B. Output: Adaptively normalized tensor of activations F RT C. |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about the release of the source code for the methodology described. |
| Open Datasets | Yes | Datasets. We use the Yelp-2 and Yelp-5 datasets [1], which are standard datasets for sentiment analysis. We use the VCTK dataset [53] which contains 44 hours of data from 108 speakers and a Piano dataset 10 hours of Beethoven sonatas [40]. We use histone Ch IP-seq data from lymphoblastoid cell lines derived from several individuals of diverse ancestry [29] |
| Dataset Splits | Yes | In the MULTISPEAKER task, we train on the first 99 VCTK speakers and test on the 8 remaining ones. Lastly, the PIANO task extends audio super resolution to non-vocal data; we use the standard 88%-6%-6% data split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Keras s built-in tokenizer" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We train for 20 epochs using the ADAM optimizer with a learning rate of 10 3 and a batch size of 128. We instantiate our model with K = 4 blocks and train it for 50 epochs on patches of length 8192 (in the high-resolution space) using the ADAM optimizer with a learning rate of 3 10 4. |