reproducibilityindex.ai

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Authors: Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios." (Abstract) and "5 EXPERIMENTS
Researcher Affiliation	Collaboration	Xian Liu1,2 Jian Ren1 Aliaksandr Siarohin1 Ivan Skorokhodov1 Yanyu Li1 Dahua Lin2 Xihui Liu3 Ziwei Liu4 Sergey Tulyakov1 1Snap Inc. 2CUHK 3HKU 4NTU
Pseudocode	No	The paper describes its methods in detail and includes an overview diagram (Figure 2), but it does not present any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Project Page: https://snap-research.github.io/HyperHuman
Open Datasets	Yes	We curate from two principled datasets: LAION-2B-en (Schuhmann et al., 2022) and COYO-700M (Byeon et al., 2022). To isolate human images, we employ YOLOS (Fang et al., 2021) for human detection." (Section 4) and "LAION-5B (Schuhmann et al., 2022): Creative Common CC-BY 4.0 license. COYO-700M (Byeon et al., 2022): Creative Common CC-BY 4.0 license. MS-COCO (Lin et al., 2014): Creative Commons Attribution 4.0 License." (Section A.17)
Dataset Splits	No	The paper states it uses MS-COCO 2014 validation for zero-shot evaluation and trains on a curated dataset called Human Verse, but it does not provide specific percentages, sample counts, or clear methodology for train/validation/test splits of its own training data.
Hardware Specification	Yes	1) For the Latent Structural Diffusion... We train on 128 80G NVIDIA A100 GPUs in a batch size of 2,048 for one week. 2) For the Structure-Guided Refiner... We train on 256 80G NVIDIA A100 GPUs in a batch size of 2,048 for one week. The whole two-stage inference process takes 12 seconds on a single 40G NVIDIA A100 GPU.
Software Dependencies	No	The paper states 'Our code is developed based on diffusers (von Platen et al., 2022)' and mentions using pretrained models like 'SD-2.0-base' and 'SDXL-1.0-base', but it does not provide specific version numbers for these software libraries or frameworks, only their names and associated research papers.
Experiment Setup	Yes	Implementation Details. ... 1) For the Latent Structural Diffusion... The DDIMScheduler with improved noise schedule is used for both training and sampling. ... The overall framework is optimized with AdamW (Kingma & Ba, 2015) in 1e-5 learning rate, and 0.01 weight decay. ... 2) For the Structure-Guided Refiner... We train on 256 80G NVIDIA A100 GPUs in a batch size of 2,048 for one week. The overall framework is optimized with AdamW (Kingma & Ba, 2015) in 1e-5 learning rate, and 0.01 weight decay. ... Table 6: Training Hyper-parameters and Network Architecture in Hyper Human lists Learning Rate 1e-5, Weight Decay 0.01, AdamW Betas (0.9, 0.999), Batch Size 2048, Condition Dropout 15% (LSD) / 50% (SGR).