CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation
Authors: Wangli Hao, Zhaoxiang Zhang, He Guan
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. In this section, several experiments are designed to evaluate the performance of our model CMCGAN. |
| Researcher Affiliation | Academia | Wangli Hao,1,4 Zhaoxiang Zhang,1,2,3,4, He Guan1,4 1Research Center for Brain-inspired Intelligence, CASIA 2National Laboratory of Pattern Recognition, CASIA 3CAS Center for Excellence in Brain Science and Intelligence Technology, CAS 4University of Chinese Academy of Sciences {haowangli2015,zhaoxiang.zhang,guanhe2015}@ia.ac.cn |
| Pseudocode | Yes | Our cross-modal visual-audio generation training algorithm is presented in Algorithm 1. Input: minibatch images x, minibatch sounds LMS a, minibatch images ˆx that mismatched with x, minibatch sounds LMS ˆa that mismatched with a, latent vector z, number of training batch steps S, number of generator loss training steps K. 1: for each i [1, S] do 2: Sample a minibatch image x and sound a 3: Forward x, a and z through network 4: Sample a minibatch mismatched image ˆx and sound ˆa 5: Forward (image, sound), (generated image, sound), (wrong image, sound), (image, generated sound) and (image, wrong sound) pairs through discriminator separately 6: Compute discriminator loss LD (Equation 1) 7: Update D 8: for each j [1, K] do 9: Compute generator loss LG (Equation 2) 10: Compute consistency loss LCons (Equation 3) 11: Update G 12: end for 13: end for |
| Open Source Code | No | The paper does not contain any statements about making its source code publicly available or provide a link to a code repository. |
| Open Datasets | Yes | To validate the performance of CMCGAN for cross-modal visual-audio mutual generation, Sub-URMP (University of Rochester Musical Performance) dataset (Li et al. 2016; Chen et al. 2017) is adopted. |
| Dataset Splits | No | The paper mentions 'Training Accuracy' and 'Testing Accuracy' but does not specify the explicit percentages or counts for training, validation, or test splits. It refers to a dataset from previous works, but does not detail the splits used in *this* paper. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions optimizers (SGD, Adam) but does not provide specific version numbers for any programming languages, libraries, or other software components used in the experiments. |
| Experiment Setup | Yes | Network parameters are learned by SGD algorithm for discriminators and Adam for generators. The batch size is set to 64 and momentum as 0.9. The learning rate in our experiments is 0.001. We stop our training procedure at 200 epochs. The size of Gaussian latent vector is 100. |