Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

Authors: Wentao Wang, Hang Ye, Fangzhou Hong, Xue Yang, Jianfu Zhang, Yizhou Wang, Ziwei Liu, Liang Pan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that Gene MAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods. Notably, Gene MAN could reveal much better generalizability in dealing with in-the-wild images, often yielding high-quality 3D human models in natural poses with common items, regardless of the body proportions in the input images. Section 5 Experiments.
Researcher Affiliation Collaboration 1Shanghai AI Laboratory 2Peking University 3Nanyang Technological University 4SAIS & SCS, Shanghai Jiao Tong University
Pseudocode No The paper describes the Gene MAN framework through text and diagrams (Figure 2, 3, 4) explaining the pipeline steps. However, it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The anonymous project page of Gene MAN is: https://roooooz.github.io/Gene MAN/. Justification: The code will be open-sourced after the review process.
Open Datasets Yes Our 3D scanned human data is aggregated from the commercial dataset Render People [1], alongside several open-source datasets: Custom Humans [13], Hu MMan [6], THuman2.0 [70], THuman3.0 [60] and X-Humans [53]. Additionally, we enrich the dataset by integrating human-specific data filtered from Objaverse [8]. For multi-view human videos, we leverage datasets such as DNA-Rendering [7], ZJU-Mocap [42], AIST++ [24], Neural Actor [29] and Actors-HQ [21]. In terms of 2D human imagery, we select data from Deep Fashion [31] and LAION-5B [51] to ensure comprehensive coverage of diverse human appearances. Justification: The paper utilizes a combination of commercial and open-source data for training. The open-source data is publicly accessible, whereas the commercial data requires acquisition through purchase from relevant websites.
Dataset Splits No For both qualitative and quantitative evaluation, we randomly select 50 samples from the Internet and CAPE [35]. The multi-source human dataset contains 100K 2D human images and 52, 345 multi-view 3D human instances in total. For the Gene MAN 3D prior, ...we incorporate an extra 20% of data curated from the Objaverse [8] dataset. The paper describes the datasets used and the evaluation samples, but it does not specify explicit training, validation, and testing splits for its overall multi-source human dataset used to train the prior models, nor general dataset splitting methodologies with percentages or counts.
Hardware Specification Yes The full optimization process takes approximately 1.4 hours on single NVIDIA A100 80G GPU. The fine-tuning process is conducted using Adam W [33] optimizer with a learning rate of 10 4 on eight NVIDIA A100 GPUs for one week. Finetuning is performed using Adam W [33] with a learning rate of 10 5 on four NVIDIA A100 GPUs for five days. All methods are tested on a single NVIDIA A100 80GB GPU.
Software Dependencies No Our framework is built upon the open-source project Three Studio [10]. We leverage Instant-NGP [39] as our Ne RF implementation. The paper mentions various tools, models, and frameworks like Three Studio, Instant-NGP, Stable Diffusion V1.5, Adam W, etc., but does not provide specific version numbers for these software dependencies or underlying libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes During the geometry stage, we progressively increase the resolution of Ne RF [38] from 256 to 384 over 5, 000 steps. We then convert it to an explicit mesh, which serves as the geometry initialization for DMTet [54] at a resolution of 512. We subsequently optimize DMTet for 3, 000 steps to sculpt finegrained geometric details. In the texture stage, we perform an initial coarse texture optimization over 10, 000 steps, followed by a refinement of the texture UV map for 1, 000 steps. The loss weights for this stage are set as follows: λr = 1 103, λm = 100, λd = 0.05, λn = 1, λ2D = 0.1, λ3D = 0.1. We adopt the Adam W [33] optimizer with a base learning rate of 0.01 during Geometry Initialization and 2 10 5 during Geometry Sculpting.