Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation

Authors: Panwang Pan, Tingting Shen, Chenxin Li, Yunlong Lin, Kairun Wen, Jingjing Zhao, Yixuan Yuan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that HUMANCRAFTER surpasses existing state-of-the-art methods in both 3D human-part segmentation and 3D human reconstruction from a single image. The paper also includes a dedicated section '4 Experiments' with subsections for evaluation and ablation studies.
Researcher Affiliation Collaboration Panwang Pan1 , Tingting Shen2 , Chenxin Li3 , Yunlong Lin2, Kairun Wen2, Jingjing Zhao1, Yixuan Yuan3 Equal contribution Corresponding author 1Byte Dance, 2Xiamen University, 3CUHK. This shows a mix of affiliations from 'Byte Dance' (industry) and 'Xiamen University', 'CUHK' (academia).
Pseudocode No The paper describes the methodology using prose and network architecture diagrams (Figure 2) but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No In the NeurIPS Paper Checklist, under question 5: 'Does the paper provide open access to the data and code...?' The answer is '[No]' with the justification: 'Code will be released upon acceptance of the paper.'
Open Datasets Yes Datasets. (1) THuman2.1 Dataset [81] contains approximately 2500 human scans. (2) 2K2K Dataset [82] includes 2000 human scans. (3) Human MVImage Net [83] approximately comprises 4000 identities and 8000 outfits, which provide the rich multi-perspectives. These datasets are referenced with citations, indicating their public availability.
Dataset Splits Yes (1) THuman2.1 Dataset [81] contains approximately 2500 human scans. Specifically, we select 2300 scans for training and the rest for evaluation. (2) 2K2K Dataset [82] includes 2000 human scans. Similarly, we select 1500 scans for training and the rest for evaluation. For the Curated Dataset, it states: 'We randomly select 500 scans from the training dataset and annotate 8 semantic segmentation maps for each scan.'
Hardware Specification Yes Leveraging a pre-trained model and human geometric priors, our method takes 7 days of training on 8 NVIDIA A800 GPUs.
Software Dependencies No To accelerate the training process, we employ Flash-Attention-v2 [96] from the x Formers library [97], gradient checkpointing [98], and BFloat16 mixed-precision arithmetic [99]. While software components are mentioned, specific version numbers for Flash-Attention-v2 or xFormers library are not provided, only references to their respective papers or general GitHub repository.
Experiment Setup Yes Training Details. The hyperparameters λmask, λp are set to 1 and 0.1 in this paper. The hyperparameters λdist and λdist2 are both set to 0.5. We use the Adam W optimizer with β1 = 0.9 and β2 = 0.95, and a weight decay of 0.05 is applied to all parameters except those in the Layer Norm layers. A cosine learning rate decay scheduler is employed, with a linear warm-up of 2,000 steps. The peak learning rate is set to 4 × 10−4. The training process is divided into two stages: the model is trained for 80K iterations at 256 × 256 resolution and then fine-tuned for an additional 20K iterations at 512 × 512 resolution. Please refer to Appendix A.2 for more detailed procedural insights.