reproducibilityindex.ai

Neural TTS Stylization with Adversarial and Collaborative Games

Authors: Shuang Ma, Daniel Mcduff, Yale Song

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model from three perspectives: content vs. style disentanglement ability (Sec. 5.1), effectiveness of style modeling (Sec. 5.2), and controllability (Sec. 5.3). We use two datasets: EMT-4, an in-house dataset of 22,377 American English audio-text samples... VCTK, a publicly available, multi-speaker dataset...
Researcher Affiliation	Collaboration	Shuang Ma State University of New York at Buffalo Buffalo, NY shuangma@buffalo.edu Daniel Mc Duff Microsoft Research Redmond, WA damcduff@microsoft.com Yale Song Microsoft Cloud & AI Redmond, WA yalesong@microsoft.com
Pseudocode	No	The paper provides schematic diagrams of network architectures (Figure 1, 3, 4, 5) but does not include any pseudocode or algorithm blocks.
Open Source Code	No	Project webpage: https://researchdemopage.wixsite.com/tts-gan. This is a project demonstration page, not a direct link to the source code repository.
Open Datasets	No	We use two datasets: EMT-4, an in-house dataset... VCTK, a publicly available, multi-speaker dataset... The paper states VCTK is publicly available but does not provide a specific link, DOI, repository name, or formal citation with author names and year for access.
Dataset Splits	No	The paper mentions training steps and selecting samples from the test set for evaluation but does not specify explicit train/validation/test dataset splits, proportions, or sample counts.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU models, or cloud computing specifications.
Software Dependencies	No	The paper mentions software components like Tacotron, WaveNet, and the Griffin-Lim method, but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We set α = 0.1, β = 10 in our experiments. We train our model with a minibatch size of 32 using the Adam optimizer; we iterated 200K steps for EMT-4 and 280K steps for VCTK datasets. The six Conv2D layers have [32, 32, 64, 64, 128, 128] ﬁlters, respectively, each with a kernel size 3 3 and a stride of 2 2. Each layer is followed by a Re LU activation and batch normalization.