Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wenping Wang, Lingjie Liu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. |
| Researcher Affiliation | Academia | 1University of Pennsylvania, 2The University of Hong Kong 3University of Illinois Urbana-Champaign, 4Max Planck Institute for Informatics 5University of Cambridge, 6Texas A&M University, 7Trans GP. The acknowledgments further state that "Trans GP project" is part of the "Inno HK initiative", implying an academic research project rather than a private industry company. |
| Pseudocode | No | The paper describes the methodology in narrative text and through architectural diagrams (Figures 5 and 6), but does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code will be available at: https://github.com/Qingxuan-Wu/DICE. |
| Open Datasets | Yes | We employ Decaf (Shimada et al., 2023) for reconstructing 3D face and hand interactions with deformations, along with the in-the-wild dataset we collected containing 500 images. We use the shape, pose, and expression data of hands and faces from Decaf (Shimada et al., 2023), Render Me-360 (Pan et al., 2023a), and Frei Hand (Zimmermann et al., 2019) for training the adversarial priors. |
| Dataset Splits | Yes | We use the official split from Decaf to separate the training and testing sets, and select a few in-the-wild images for the test set to perform qualitative visualizations. To be consistent with the training setting of Decaf1 (Shimada et al., 2023), in the Decaf dataset, we use all eight camera views and the subjects S2, S4, S5, S7, and S8 in the training data split for training. For testing, we use only the front view (view 108) and the subjects S1, S3, and S6 in the testing data split. |
| Hardware Specification | Yes | Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. The model is trained and evaluated on 8 Nvidia A6000 GPUs with an AMD 128-core CPU. Inference times are calculated on a single Nvidia A6000 GPU. |
| Software Dependencies | No | The paper mentions using Adam W optimizers but does not specify any software frameworks (e.g., PyTorch, TensorFlow) or their version numbers, nor any other key software libraries with version details. |
| Experiment Setup | Yes | We train Mesh Net, Interaction Net, and IKNet, along with the face and hand discriminators using Adam W (Loshchilov, 2017) optimizers, each with a learning rate of 6 10 4, and a learning rate decay of 1 10 4. Our batch size is set to 16 during the training stage. The training takes 40 epochs, totalling 48 hours. The generator and discriminator networks are optimized in an alternating manner. Overall, our loss for the mesh and interaction networks is formulated by... where λmesh = 12.5, λinteraction = 5, λdepth = 2.5, λadv = 1 for all the experiments in the paper. |