Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

Authors: Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Being modal-agnostic, the proposed Retriever is evaluated in both speech and image domains. The state-of-the-art zero-shot voice conversion performance confirms the disentangling ability of our framework. Top performance is also achieved in the part discovery task for images, verifying the interpretability of our representation. In addition, the vivid part-based style transfer quality demonstrates the potential of Retriever to support various fascinating generative tasks.
Researcher Affiliation Collaboration 1University of Science and Technology of China, 2HKUST, 3Microsoft Research Asia, 4EIT
Pseudocode No The paper describes algorithms and methods in text and figures, but does not include formal pseudocode blocks or sections labeled "Algorithm".
Open Source Code No The paper provides a link to a "Project page at https://ydcustc.github.io/retriever-demo/". However, this is described as a project demonstration page, not a direct link to a code repository.
Open Datasets Yes Retriever for speech signals is trained with the entire 44-hour CSTR VCTK Corpus (Veaux et al., 2017) containing 109 speakers. [...] We choose two commonly used datasets: Celeba-Wild (Liu et al., 2015) and Deep Fashion (Liu et al., 2016).
Dataset Splits No The paper mentions training on the "CSTR VCTK Corpus" and testing on "CMU Arctic databases" or "Libri Speech test-clean split". While it specifies training and testing sets, it does not explicitly provide percentages or counts for a separate validation split for its main experiments, nor does it detail a full train/validation/test split from a single dataset in a way that directly supports reproduction of the data partitioning across all three phases.
Hardware Specification Yes Our model is implemented with Pytorch and trained on 4 Nvidia V100 GPUs.
Software Dependencies No The paper mentions software like "Pytorch", "s3prl toolkit", and "Parallel Wave GAN" but does not specify their version numbers, which is required for reproducibility.
Experiment Setup Yes Table 9: Training setting [Hyper-parameter Value λrec 1 λVQ 0.3 λsc 4 or 7 optimizer Adam (β1 = 0.9, β2 = 0.999) Learning rate 0.001 Batchsize 16 Iteration 150,000]. Table 13: Training setting [Hyper-parameter Value λrec 5 λVQ 0.3 λSC 0.1 Optimizer Adam (β1 = 0.9, β2 = 0.999) Learning rate 0.004 Learning rate schedule power (p = 0.3, warmup-steps = 625 ) batch-size 120 epoch 50].