Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

STCN: Stochastic Temporal Convolutional Networks

Authors: Emre Aksan, Otmar Hilliges

ICLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed variants STCN and STCN-dense both quantitatively and qualitatively on modeling of digital handwritten text and speech. We compare with vanilla TCNs, RNNs, VRNNs and state-of-the art models on the corresponding tasks. Table 1: Average log-likelihood per sequence on TIMIT, Blizzard, IAM-On DB and Deepwriting datasets.
Researcher Affiliation Academia Emre Aksan & Otmar Hilliges Department of Computer Science ETH Zurich, Switzerland EMAIL
Pseudocode No The paper includes a diagram (Figure 4) of the model architecture but no textual pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://ait.ethz.ch/projects/2019/stcn/.
Open Datasets Yes The IAM-On DB data is split and pre-processed as done in (Chung et al., 2015). Aksan et al. (2018) extend this dataset with additional samples and better pre-processing. TIMIT and Blizzard are standard benchmark dataset in speech modeling.
Dataset Splits Yes The IAM-On DB data is split and pre-processed as done in (Chung et al., 2015). We applied early stopping by measuring the ELBO performance on the validation splits.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Software Dependencies No We implement STCN models in Tensor๏ฌ‚ow (Abadi et al., 2016).
Experiment Setup Yes In all STCN experiments we applied KL annealing. In all tasks, the weight of the KL term is initialized with 0 and increased by 1 e 4 at every step until it reaches 1. The batch size was 20 for all datasets except for Blizzard where it was 128. We use the ADAM optimizer with its default parameters and exponentially decay the learning rate. For the handwriting datasets the learning rate was initialized with 5 e 4 and followed a decay rate of 0.94 over 1000 decay steps. On the speech datasets it was initialized with 1 e 3 and decayed with a rate of 0.98.