HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
Authors: Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios." (Abstract) and "5 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Xian Liu1,2 Jian Ren1 Aliaksandr Siarohin1 Ivan Skorokhodov1 Yanyu Li1 Dahua Lin2 Xihui Liu3 Ziwei Liu4 Sergey Tulyakov1 1Snap Inc. 2CUHK 3HKU 4NTU |
| Pseudocode | No | The paper describes its methods in detail and includes an overview diagram (Figure 2), but it does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project Page: https://snap-research.github.io/HyperHuman |
| Open Datasets | Yes | We curate from two principled datasets: LAION-2B-en (Schuhmann et al., 2022) and COYO-700M (Byeon et al., 2022). To isolate human images, we employ YOLOS (Fang et al., 2021) for human detection." (Section 4) and "LAION-5B (Schuhmann et al., 2022): Creative Common CC-BY 4.0 license. COYO-700M (Byeon et al., 2022): Creative Common CC-BY 4.0 license. MS-COCO (Lin et al., 2014): Creative Commons Attribution 4.0 License." (Section A.17) |
| Dataset Splits | No | The paper states it uses MS-COCO 2014 validation for zero-shot evaluation and trains on a curated dataset called Human Verse, but it does not provide specific percentages, sample counts, or clear methodology for train/validation/test splits of its own training data. |
| Hardware Specification | Yes | 1) For the Latent Structural Diffusion... We train on 128 80G NVIDIA A100 GPUs in a batch size of 2,048 for one week. 2) For the Structure-Guided Refiner... We train on 256 80G NVIDIA A100 GPUs in a batch size of 2,048 for one week. The whole two-stage inference process takes 12 seconds on a single 40G NVIDIA A100 GPU. |
| Software Dependencies | No | The paper states 'Our code is developed based on diffusers (von Platen et al., 2022)' and mentions using pretrained models like 'SD-2.0-base' and 'SDXL-1.0-base', but it does not provide specific version numbers for these software libraries or frameworks, only their names and associated research papers. |
| Experiment Setup | Yes | Implementation Details. ... 1) For the Latent Structural Diffusion... The DDIMScheduler with improved noise schedule is used for both training and sampling. ... The overall framework is optimized with AdamW (Kingma & Ba, 2015) in 1e-5 learning rate, and 0.01 weight decay. ... 2) For the Structure-Guided Refiner... We train on 256 80G NVIDIA A100 GPUs in a batch size of 2,048 for one week. The overall framework is optimized with AdamW (Kingma & Ba, 2015) in 1e-5 learning rate, and 0.01 weight decay. ... Table 6: Training Hyper-parameters and Network Architecture in Hyper Human lists Learning Rate 1e-5, Weight Decay 0.01, AdamW Betas (0.9, 0.999), Batch Size 2048, Condition Dropout 15% (LSD) / 50% (SGR). |