Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RigAnyFace: Scaling Neural Facial Mesh Auto-Rigging with Unlabeled Data

Authors: Wenchao Ma, Dario Kneubuehler, Maurice Chu, Ian Sachs, Haomiao Jiang, Sharon Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that RAF is able to rig meshes of diverse topologies on not only our artist-crafted assets but also in-the-wild samples, outperforming previous works in accuracy and generalizability. Moreover, our method advances beyond prior work by supporting multiple disconnected components, such as eyeballs, for more detailed expression animation. In this section, we evaluate RAF on both the artist-crafted and in-the-wild facial meshes and compare it with the prior art NFR [50] and a representative deformation-transfer method [58]. Quantitative results are presented in Tab. 1, while qualitative results are shown in Fig. 5(a).
Researcher Affiliation	Collaboration	Wenchao Ma1 Dario Kneubuehler2 Maurice Chu2 Ian Sachs2 Haomiao Jiang2 Sharon X. Huang1 1Penn State University 2Roblox
Pseudocode	Yes	B Details for 2D Displacement Calculation In the following code sample, we demonstrate how to compute the 2D displacement of each pixel from mesh vertex deformations in a fully differentiable manner. This implementation leverages Py Torch3D s differentiable rendering functionality. def render_displacement(vertices, deformed_vertices, faces, renderer, camera, res =(512,512)): """ Parameters ---------vertices: torch.tensor (V, 3) deformed_vertices: torch.tensor (V, 3) faces: torch.tensor (F, 3) renderer: pytorch3d.renderer.Mesh Renderer object camera: pytorch3d.renderer.cameras.Cameras Base object res: tuple Returns ------displacement_2D: torch.tensor (res[0], res[1], 2) """ verts_2d = camera.transform_points_screen(vertices, image_size=res) verts_2d_deformed = camera.transform_points_screen(deformed_vertices, image_size =res) verts_flow = (verts_2d_deformed verts_2d)[:, :2] # Vx2 verts_flow = verts_flow / res * 0.5 + 0.5 # 0~1 flow_tex = torch.nn.functional.pad(verts_flow, pad=[0, 1]) # Vx3 texture = Textures Vertex(verts_features=[flow_tex]) meshes = pytorch3d.structures.Meshes( verts=[vertices], faces=[faces], textures=texture ) displacement_2D = renderer(meshes, cameras=camera) return displacement_2D[...,:2].squeeze()
Open Source Code	No	Upon acceptance, we will release our inference code and model weights so that reported evaluation results can be reproduced. We are also working to make the full dataset and training code publicly available, pending internal approval and legal clearance.
Open Datasets	Yes	Experiments show that our method outperforms prior work across assets from diverse sources, including our artist-crafted meshes and in-the-wild models from ICT Face Kit [36], Objaverse [15], and CGTrader [9].
Dataset Splits	Yes	Our dataset includes 161 rigged heads and 175 unrigged heads. From these, a subset of 24 rigged heads with 3D ground-truth annotations forms the test set to for accurate absolute error evaluation. Additionally, we select 37 diverse unrigged heads as the test set, representing different species and shapes to evaluate the model s generalization on out-of-distribution (OOD) faces. For training, we augment the dataset using interpolations, manually filtering out poor interpolation results. Specifically, we interpolate the remaining 137 unrigged heads with a factor of 50, generating 5,457 samples, and interpolate the remaining 137 rigged heads with a factor of 25, producing 2,929 samples.
Hardware Specification	Yes	Training runs on an instance with 8 NVIDIA A100 GPUs and takes about 2 days. For inference, it takes on average 8.72s on an Apple M2 Max CPU and 3.1s on an Nvidia T4 GPU to generate a FACS blendshape rig on the test set.
Software Dependencies	No	This implementation leverages Py Torch3D s differentiable rendering functionality.
Experiment Setup	Yes	In the first stage of training, the weights for the image loss, mask loss, 2D displacement loss, and regularization loss are set to 10.0, 1.0, 1.0, and 0.0001, respectively. In the second stage, the weights for the image loss, mask loss, 3D MSE loss, 2D landmark loss, and 2D eye closure loss are set to 10.0, 1.0, 100.0, 0.5, and 0.5, respectively. We train our model on an Nvidia A100 instance with 8 GPUs and a total batch size of 8 (i.e., effectively 1 sample per GPU if using distributed data parallel). The training proceeds in two stages. For the first stage, we train the deformation model on both rigged and unrigged head datasets (8,386 samples in total) using only 2D supervision for 15 epochs. This stage typically takes around 1.5 days to complete. For the second stage, we then finetune the model from the first stage on the rigged head dataset (2,929 samples), incorporating both 2D and 3D supervision for 20 epochs. This finetuning phase finishes in approximately 1 day. Throughout both stages, we use the Adam optimizer, initializing the learning rate at 0.0001. For learning rate scheduling, we employ Cosine Annealing Warm Restarts, allowing it to decay from 0.0001 to nearly 0 by the end of training. Additionally, we use a warm-up phase of 20,000 steps to stabilize early training.