reproducibilityindex.ai

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

Authors: Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that DIMNet is able to achieve better performance than the current state-of-the-art methods, with the additional beneﬁts of being conceptually simpler and less data-intensive. The code is made available at https://github.com/ydwen/DIMNet. Our experiments were conducted on the Voxceleb (Nagrani et al., 2017) and VGGFace (Parkhi et al., 2015) datasets, which are speciﬁed in appendix A.1. We ran experiments on matching voices to faces, to evaluate the embeddings derived by DIMNets.
Researcher Affiliation	Academia	Yandong Wen , Mahmoud Al Ismail , Weiyang Liu , Bhiksha Raj , Rita Singh Carnegie Mellon University Georgia Institute of Technology yandongw@andrew.cmu.edu, mahmoudi@andrew.cmu.edu, wyliu@gatech.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is made available at https://github.com/ydwen/DIMNet.
Open Datasets	Yes	Our experiments were conducted on the Voxceleb (Nagrani et al., 2017) and VGGFace (Parkhi et al., 2015) datasets, which are speciﬁed in appendix A.1. We use the intersection of the two datasets... The data are split into train/validation/test sets, following the settings in Nagrani et al. (2018b). Details can be found in Appendix A.1.
Dataset Splits	Yes	The data are split into train/validation/test sets, following the settings in Nagrani et al. (2018b). Details can be found in Appendix A.1. ... Table 6: Statistics for the data appearing in Vox Celeb and VGGFace. # of samples train validation test total speech segments 112,697 14,160 21,799 148,656 face images 313,593 36,716 58,420 408,729
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running experiments.
Software Dependencies	No	The paper mentions tools like 'energy-based voice activity detector (Povey et al., 2011)' and 'MTCNN (Zhang et al., 2016)' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	The detailed network conﬁgurations are elaborated in appendix A.3. ... Minibatch size is 256. The momentum and weight decay values are 0.9 and 0.001 respectively. To learn the networks from scratch, the learning rate is initialized at 0.1 and divided by 10 after 16K iterations and again after 24K iterations. The training is completed at 28K iterations.