Disentangled Variational Representation for Heterogeneous Face Recognition
Authors: Xiang Wu, Huaibo Huang, Vishal M. Patel, Ran He, Zhenan Sun9005-9012
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three challenging NIR-VIS heterogeneous face recognition databases demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods.Extensive experimental results are conducted on three HFR databases, including the CASIA NIR-VIS 2.0 database (Li et al. 2013), the Oulu-CASIA NIR-VIS database (Chen et al. 2009) and the BUAA-Vis Nir database (Huang, Sun, and Wang 2012), and comparisons are performed against several recent state-of-the-art approaches. Furthermore, an ablation study is conducted to demonstrate the improvements obtained by various components of the proposed method. |
| Researcher Affiliation | Academia | Xiang Wu,1,2 Huaibo Huang,1,2,3 Vishal M. Patel,4 Ran He,1,2 Zhenan Sun1,2 1Center for Research on Intelligent Perception and Computing (CRIPAC), CASIA, Beijing, China 2National Laboratory of Pattern Recognition (NLPR), CASIA, Beijing, China 3School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 4Johns Hopkins University, 3400 N. Charles St, Baltimore, MD 21218, USA |
| Pseudocode | Yes | Algorithm 1 Disentangled Variational Representation (DVR) Training. Require: Training set: NIR images IN, VIS images IV , the learning rate α and the trade-off parameters λ1, λ2, λ3. Ensure: The CNN parameters Θ, W, the approximate posterior estimators φN, φV and correlation alignment matrix P. 1: Initialize Θ, W by pre-trained model; 2: Obtain x N = f(IN; Θ), x V = f(IV ; Θ); 3: Initialize φN, φV , P randomly; 4: for t = 1, . . . , T do 5: Optimize φN, φV without mean discrepancy and correlation alignment parts; 6: end for; 7: for t = 1, . . . , T do 8: Given ϵ N(0, I), generate ˆx N and ˆx V via Eq. (2) and Eq. (4); 9: Compute loss Jcls via Eq. (10) 10: Fix φN, φV , P; 11: Update Θ, W via back-propagation; 12: Obtain x N = f(IN; Θ), x V = f(IV ; Θ); 13: Fix Θ, W, φN, φV 14: Update P by gradient descent; 17: end for; 18: Return Θ, W, φN, φV , P; |
| Open Source Code | No | The paper provides a link for the Light CNN backbone network (1https://github.com/AlfredXiangWu/LightCNN), but does not explicitly state that the source code for their proposed DVR methodology is publicly available. |
| Open Datasets | Yes | Three publicly available VIS-to-NIR face recognition datasets are used to evaluate the performance of different HFR methods. The CASIA NIR-VIS 2.0 Face Database (Li et al. 2013)... The Oulu-CASIA NIR-VIS Database (Chen et al. 2009)... The BUAA-Vis Nir Face Database (Huang, Sun, and Wang 2012)... |
| Dataset Splits | Yes | The CASIA NIR-VIS 2.0 Face Database... It consists of 10-fold experiments. For training, there are about 2,500 VIS and 6,100 NIR images from 360 identities. For testing, the gallery set in each fold is constructed from 358 identities and each identity only has one VIS image. The probe set contains over 6,000 NIR images from the same 358 identities.The Oulu-CASIA NIR-VIS Database... we select 20 identities as the training set and 20 identities as the testing set.The BUAA-Vis Nir Face Database... The training set and testing set are composed of 900 images from 50 identities and 1800 images from the remaining 100 identities, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using SGD and Adam optimizers and Light CNN as a backbone, but it does not specify any software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow versions, CUDA version, etc.). |
| Experiment Setup | Yes | All the images in the training set are aligned to 144 144 and randomly cropped to 128 128 as the input. Stochastic gradient descent (SGD) is used, where the momentum is set to 0.9 and weight decay is set to 5e-4. The learning rate is set to 1e-4 initially and reduced to 5e-5 gradually. The batch size is set to 128 and the dropout ratio is 0.5. A multilayer perceptron (MLP) is used to model the DVR parts. It contains four hidden layers with h dimensions to represent µN, µV , σN and σV . Moreover, the correlation alignment matrix P is an h h matrix. Specifically, in the experiments, the dimension h is set equal to 64. The input and the output layers are both 256-d, which are similar to the dimensions of features from the face recognition network. During training, the parameters of MLP are initialized by a Gaussian, while P is initialized by an identity matrix I. Adam (Kingma and Ba 2015) is used for back-propagation and the initial learning rate is set 1e-3 and gradually reduced to 1e-5. The batch size is set to 128. The trade-off parameters λ1, λ2 and λ3 are set equal to 1.0, 0.1 and 0.001, respectively. |