Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GLVD: Guided Learned Vertex Descent

Authors: Pol Caselles RIco, Francesc Moreno-Noguer

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a comprehensive evaluation on both single-view and multi-view 3D face reconstruction benchmarks. Our approach achieves state-of-the-art performance in single-image reconstruction and remains competitive with optimization-based methods in multi-view scenarios, demonstrating its robustness, accuracy, and broad applicability. We conducted a comprehensive comparison of our method with several 3DMM-based reconstruction works, including MVFNet [2], DFNRMVS [69], DECA [17], MICA [81], Face Scape [80], Face Verse [66], HRN [30], 3DDFA-v3 [68] and VHAP [48]. Additionally, we compared our approach to the model-free methods PIFU [57], JIFF [7], RAFa Re[22], H3D-Net [52] SIRA++ [9] and hybrid method LVD [11]. We used the unidirectional Chamfer distance for the quantitative evaluation, measuring the surface error from the ground truth to the predictions. The results of this comparison are summarized in Table 1. Qualitative results for 3DFAW subjects are presented in Figure 4 for the single view and in Figure 5 for the multiview setting. Results on H3DS are presented in Figure 3. We show in Figure 7 the estimated 3D face and the guiding keypoints.
Researcher Affiliation	Collaboration	Pol Caselles Rico Institut de Robotica i Informatica Industrial, CSIC-UPC Crisalix SA Barcelona, Spain EMAIL Francesc Moreno Noguer Amazon Barcelona, Spain EMAIL
Pseudocode	No	The paper describes the method using mathematical equations and textual explanations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Due to legal constraints, we are unable to release the training data. However, we will publish the code along with detailed instructions on how to prepare compatible datasets for training. We release the full codebase on Git Hub, including model implementation, training, and evaluation scripts (to be made public upon publication).
Open Datasets	Yes	H3DS 2.0. [52, 9] It contains 60 high-quality 3D full-head scans, including hair and shoulders, paired with posed RGB images. Each image includes a foreground mask and calibrated camera parameters. 3DFAW. [47] This dataset provides videos recorded as well as mid-resolution 3D ground truth of the facial region. We select 5 male and 5 female scenes and use them to evaluate only the facial region. Celeb A-HQ. [27] This dataset comprises 30k high-quality images at 1024 1024 resolution, derived from the original Celeb A dataset. We selected a subset of 6 subjects for our qualitative evaluation.
Dataset Splits	No	We employ a proprietary dataset of 3D head scans collected from 10,000 individuals, balanced by gender and diverse in age and ethnicity. All scans are aligned to a template 3D model using non-rigid Iterative Closest Point (ICP) registration for consistency. We use a dataset of N training scenes, each with a ground truth mesh with known topology, posed RGB images, and head masks. The paper mentions a proprietary dataset for training and uses existing public datasets for evaluation, but does not specify explicit training/validation/test splits for these datasets.
Hardware Specification	Yes	All networks are trained end-to-end using GPU-accelerated hardware (RTX 4090).
Software Dependencies	No	The function fv( ) is a stacked hourglass network [43] composed of four stacks using group normalization [70]. The function fk( ) is implemented by a combination of a facial keypoint heatmap estimator HRNet [65] and a single-stack hourglass network [43]. Optimization is performed using Adam [29] with β1 = 0.9 and β2 = 0.999. The paper mentions several software components like stacked hourglass networks, HRNet, and Adam optimizer, but does not provide specific version numbers for any of them or for underlying frameworks like PyTorch or TensorFlow.
Experiment Setup	Yes	All networks are trained end-to-end using GPU-accelerated hardware (RTX 4090). We use a batch size of 4 and an initial learning rate of 0.001 for 50 epochs, followed by 200 additional epochs with linear learning rate decay. For each scene, we sample 1400 vertices as query points. It takes between 1.5 to 6 days of training, depending on the configuration. We set λ1 = λ2 = 0.5. Optimization is performed using Adam [29] with β1 = 0.9 and β2 = 0.999. The total number of parameters is detailed in Table 4.