Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Neural Speech Synthesis with Transformer Network

Authors: Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu6706-6713

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are conducted to test the efﬁciency and performance of our new network.
Researcher Affiliation	Collaboration	Naihan Li, 1,4 Shujie Liu,2 Yanqing Liu,3 Sheng Zhao,3 Ming Liu1,4 1University of Electronic Science and Technology of China 2Microsoft Research Asia 3Microsoft STC Asia 4CETC Big Data Research Institute Co.,Ltd, Guizhou, China
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper provides a link to audio samples, not the source code for the described methodology. 'Audio samples can be accessed on https://neuraltts.github.io/transformertts/'
Open Datasets	No	The paper states that an 'internal US English female dataset' was used, with no public access information provided. 'We use 4 Nvidia Tesla P100 to train our model with an internal US English female dataset, which contains 25-hour professional speech (17584 text, wave pairs, with a few too long waves removed).'
Dataset Splits	No	The paper mentions using a 'dynamic batch size' and 'on average 16 samples in single batch per GPU' but does not specify explicit training, validation, or test dataset splits (e.g., percentages or exact counts) for reproducibility.
Hardware Specification	Yes	We use 4 Nvidia Tesla P100 to train our model with an internal US English female dataset...
Software Dependencies	No	The paper mentions using Tacotron2 and Wave Net as components but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Therefore, we use the dynamic batch size where the maximum total number of mel spectrogram frames is ﬁxed and one batch should contain as many samples as possible. Thus there are on average 16 samples in single batch per GPU. ... The sample rate of ground truth audios is 16000 and frame rate (frames per second) of ground truth mel spectrogram is 80. Our autoregressive Wave Net contains 2 QRNN layers and 20 dilated layers, and the sizes of all residual channels and dilation channels are all 256.