Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses

Authors: Jing Wen, Alex Schwing, Shenlong Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on challenging THuman2.0, XHuman, and Hu Ge100K data show that No Po-Avatar outperforms existing baselines in practical settings (without ground-truth poses) and delivers comparable results in lab settings (with ground-truth poses).
Researcher Affiliation Academia Jing Wen Alexander G. Schwing Shenlong Wang University of Illinois Urbana-Champaign EMAIL
Pseudocode No The paper describes the methodology in prose and architectural diagrams (e.g., Figure 2) but does not include distinct pseudocode blocks or algorithms.
Open Source Code No We will release the codes and model weights upon acceptance.
Open Datasets Yes Datasets. We train our model on THuman2.0 [35], THuman2.1 [35], and Hu Ge100K [37]. For evaluation purposes, we adopt THuman2.0, Hu Ge100K and XHuman [21]. THuman2.0. THuman2.0 provides 526 3D scans as well as the corresponding SMPL-X parameters. ... THuman2.0 uses a special license agreement1, which we follow. Hu Ge100K. Hu Ge100K is a synthetic dataset that contains more than 100K subjects. ... Hu Ge100K uses Deep Fashion s license3, which we follow.
Dataset Splits Yes We follow GHG s split [10] on THuman2.0. On Hu Ge100K, we use the scripts provided by IDOL [37] and split each directory into 10 validation subjects, 50 test subjects and the rest for training.
Hardware Specification Yes We use two NVIDIA L40S for training at a resolution of 256 256, four NVIDIA L40S for 512 512 resolution training, and four NVIDIA H200 for the last stage, i.e., for full resolution training.
Software Dependencies No The paper mentions 'The model is pretrained from No Po Splat [34]' but does not specify software dependencies with version numbers such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes The learnable embedding in the template encoder F T 0 is in the shape of 16 16 1024. We use αlpips = 0.05, αchamfer = 0.1, αproj = 1.0, and αlbs = 0.01 in Eq. (4). The model is first trained at a resolution of 256 256 for 300K iterations, then upsampled to 512 512 for another 300K iterations, and finally fine-tuned at full resolution for an additional 50K iterations. We train for 80K iterations in the full resolution of 896 640 on Hu Ge100K. We use a batch size of 4 for all experiments on THuman2.0, THuman2.1, and THuman2.1 + Hu Ge100K. For the comparison to LHM and IDOL on Hu Ge100K, we use a batch size of 16 in the first two stages and 8 in the last stage.