STCN: Stochastic Temporal Convolutional Networks
Authors: Emre Aksan, Otmar Hilliges
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed variants STCN and STCN-dense both quantitatively and qualitatively on modeling of digital handwritten text and speech. We compare with vanilla TCNs, RNNs, VRNNs and state-of-the art models on the corresponding tasks. Table 1: Average log-likelihood per sequence on TIMIT, Blizzard, IAM-On DB and Deepwriting datasets. |
| Researcher Affiliation | Academia | Emre Aksan & Otmar Hilliges Department of Computer Science ETH Zurich, Switzerland {emre.aksan, otmar.hilliges}@inf.ethz.ch |
| Pseudocode | No | The paper includes a diagram (Figure 4) of the model architecture but no textual pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://ait.ethz.ch/projects/2019/stcn/. |
| Open Datasets | Yes | The IAM-On DB data is split and pre-processed as done in (Chung et al., 2015). Aksan et al. (2018) extend this dataset with additional samples and better pre-processing. TIMIT and Blizzard are standard benchmark dataset in speech modeling. |
| Dataset Splits | Yes | The IAM-On DB data is split and pre-processed as done in (Chung et al., 2015). We applied early stopping by measuring the ELBO performance on the validation splits. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. |
| Software Dependencies | No | We implement STCN models in Tensorflow (Abadi et al., 2016). |
| Experiment Setup | Yes | In all STCN experiments we applied KL annealing. In all tasks, the weight of the KL term is initialized with 0 and increased by 1 e 4 at every step until it reaches 1. The batch size was 20 for all datasets except for Blizzard where it was 128. We use the ADAM optimizer with its default parameters and exponentially decay the learning rate. For the handwriting datasets the learning rate was initialized with 5 e 4 and followed a decay rate of 0.94 over 1000 decay steps. On the speech datasets it was initialized with 1 e 3 and decayed with a rate of 0.98. |