reproducibilityindex.ai

CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation

Authors: Wangli Hao, Zhaoxiang Zhang, He Guan

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. In this section, several experiments are designed to evaluate the performance of our model CMCGAN.
Researcher Affiliation	Academia	Wangli Hao,1,4 Zhaoxiang Zhang,1,2,3,4, He Guan1,4 1Research Center for Brain-inspired Intelligence, CASIA 2National Laboratory of Pattern Recognition, CASIA 3CAS Center for Excellence in Brain Science and Intelligence Technology, CAS 4University of Chinese Academy of Sciences {haowangli2015,zhaoxiang.zhang,guanhe2015}@ia.ac.cn
Pseudocode	Yes	Our cross-modal visual-audio generation training algorithm is presented in Algorithm 1. Input: minibatch images x, minibatch sounds LMS a, minibatch images ˆx that mismatched with x, minibatch sounds LMS ˆa that mismatched with a, latent vector z, number of training batch steps S, number of generator loss training steps K. 1: for each i [1, S] do 2: Sample a minibatch image x and sound a 3: Forward x, a and z through network 4: Sample a minibatch mismatched image ˆx and sound ˆa 5: Forward (image, sound), (generated image, sound), (wrong image, sound), (image, generated sound) and (image, wrong sound) pairs through discriminator separately 6: Compute discriminator loss LD (Equation 1) 7: Update D 8: for each j [1, K] do 9: Compute generator loss LG (Equation 2) 10: Compute consistency loss LCons (Equation 3) 11: Update G 12: end for 13: end for
Open Source Code	No	The paper does not contain any statements about making its source code publicly available or provide a link to a code repository.
Open Datasets	Yes	To validate the performance of CMCGAN for cross-modal visual-audio mutual generation, Sub-URMP (University of Rochester Musical Performance) dataset (Li et al. 2016; Chen et al. 2017) is adopted.
Dataset Splits	No	The paper mentions 'Training Accuracy' and 'Testing Accuracy' but does not specify the explicit percentages or counts for training, validation, or test splits. It refers to a dataset from previous works, but does not detail the splits used in this paper.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions optimizers (SGD, Adam) but does not provide specific version numbers for any programming languages, libraries, or other software components used in the experiments.
Experiment Setup	Yes	Network parameters are learned by SGD algorithm for discriminators and Adam for generators. The batch size is set to 64 and momentum as 0.9. The learning rate in our experiments is 0.001. We stop our training procedure at 200 epochs. The size of Gaussian latent vector is 100.