reproducibilityindex.ai

STCN: Stochastic Temporal Convolutional Networks

Authors: Emre Aksan, Otmar Hilliges

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed variants STCN and STCN-dense both quantitatively and qualitatively on modeling of digital handwritten text and speech. We compare with vanilla TCNs, RNNs, VRNNs and state-of-the art models on the corresponding tasks. Table 1: Average log-likelihood per sequence on TIMIT, Blizzard, IAM-On DB and Deepwriting datasets.
Researcher Affiliation	Academia	Emre Aksan & Otmar Hilliges Department of Computer Science ETH Zurich, Switzerland {emre.aksan, otmar.hilliges}@inf.ethz.ch
Pseudocode	No	The paper includes a diagram (Figure 4) of the model architecture but no textual pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://ait.ethz.ch/projects/2019/stcn/.
Open Datasets	Yes	The IAM-On DB data is split and pre-processed as done in (Chung et al., 2015). Aksan et al. (2018) extend this dataset with additional samples and better pre-processing. TIMIT and Blizzard are standard benchmark dataset in speech modeling.
Dataset Splits	Yes	The IAM-On DB data is split and pre-processed as done in (Chung et al., 2015). We applied early stopping by measuring the ELBO performance on the validation splits.
Hardware Specification	Yes	We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Software Dependencies	No	We implement STCN models in Tensorﬂow (Abadi et al., 2016).
Experiment Setup	Yes	In all STCN experiments we applied KL annealing. In all tasks, the weight of the KL term is initialized with 0 and increased by 1 e 4 at every step until it reaches 1. The batch size was 20 for all datasets except for Blizzard where it was 128. We use the ADAM optimizer with its default parameters and exponentially decay the learning rate. For the handwriting datasets the learning rate was initialized with 5 e 4 and followed a decay rate of 0.94 over 1000 decay steps. On the speech datasets it was initialized with 1 e 3 and decayed with a rate of 0.98.