Face Reconstruction from Voice using Generative Adversarial Networks
Authors: Yandong Wen, Bhiksha Raj, Rita Singh
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of the network by leveraging a closely related task cross-modal matching. The results show that our model is able to generate faces that match several biometric characteristics of the speaker, and results in matching accuracies that are much better than chance. |
| Researcher Affiliation | Academia | Yandong Wen Carnegie Mellon University Pittsburgh, PA 15213 yandongw@andrew.cmu.edu; Rita Singh Carnegie Mellon University Pittsburgh, PA 15213 rsingh@cs.cmu.edu; Bhiksha Raj Carnegie Mellon University Pittsburgh, PA 15213 bhiksha@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 The training algorithm of the proposed framework |
| Open Source Code | Yes | The code is publicly available in https://github.com/cmu-mlsp/reconstructing_faces_from_voices |
| Open Datasets | Yes | In our experiments, the voice recordings are from the Voxceleb [25] dataset and the face images are from the manually filtered version of VGGFace [26] dataset. Both datasets have identity labels. |
| Dataset Splits | Yes | We follow the train/validation/test split in [25]. The details are shown in Table 1. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions training a model and the optimization settings. |
| Software Dependencies | No | The paper mentions using components like Adam optimizer, convolutional neural networks, Batch Normalization, ReLU, and Leaky ReLU, but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We used the Adam optimizer [14] with learning rate of 0.0002. β1 and β2 are 0.5 and 0.999, respectively. Minibatch size is 128. The training is completed at 100K iterations. The network architecture is given in Table 2. |