reproducibilityindex.ai

EditSinger: Zero-Shot Text-Based Singing Voice Editing System with Diverse Prosody Modeling

Authors: Lichao Zhang, Zhou Zhao, Yi Ren, Liqun Deng

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments conducted on the Open Singer prove that Edit Singer can synthesize high-quality edited singing voices with natural prosody according to the corresponding operations. We conduct experiments on the singing datasets Open Singer [Huang et al., 2021] which consists of 50 hours of Chinese singing voices recorded in a professional recording studio and split Open Singer randomly by singer into the test set (songs of 3 females and 3 males) and the training set (all the songs of the remaining singers).
Researcher Affiliation	Collaboration	Lichao Zhang1 , Zhou Zhao1 , Yi Ren1 and Liqun Deng2 1Zhejiang University 2Huawei Noah s Ark Lab
Pseudocode	No	The paper provides architectural diagrams (Figure 1, Figure 2) but does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper states: "Audio samples can be listened in https://editsinger.github.io/." This link is for audio samples, not the source code for the methodology. There is no explicit statement about code release.
Open Datasets	Yes	We conduct experiments on the singing datasets Open Singer [Huang et al., 2021] which consists of 50 hours of Chinese singing voices recorded in a professional recording studio and split Open Singer randomly by singer into the test set (songs of 3 females and 3 males) and the training set (all the songs of the remaining singers).
Dataset Splits	No	The paper mentions splitting data into a 'test set' and 'training set' but does not explicitly specify a separate 'validation' set or provide specific split percentages for training, validation, and test.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions tools and models like "Fast Speech 2", "Parallel Wave GAN (PWG)", "Pypinyin", "montreal forced alignment (MFA)", "parselmouth", and "resemblyzer", but it does not specify version numbers for these software components.
Experiment Setup	Yes	We stack 4 feed-forward Transformer blocks in both the encoder and decoder of the acoustic model and set the hidden size to 256, and the same configuration is also used in the MVA and F0 predictor of FPIP. The V/UV predictor consists of a 4-layer 1D-convolutional network. We minimize the MAE and SSIM [Wang et al., 2004] loss between the output mel-spectrograms and the ground truth mel-spectrograms to optimize the phoneme encoder and mel decoder. We randomly mask out 15% of the words in the lyrics... λ is a hyperparameter that weighs the importance of the three terms, which are all set to 1 in our experiments.