Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation
Authors: Jiajun Wang, Morteza Ghahremani Boozandani, Yitong Li, Björn Ommer, Christian Wachinger
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experimental Results |
| Researcher Affiliation | Academia | Jiajun Wang 1, Morteza Ghahremani 1,3, Yitong Li 1,3, Björn Ommer2,3, and Christian Wachinger1,3 1Lab for AI in Medical Imaging, Technical University of Munich (TUM), Germany 2Comp Vis @ LMU Munich, Germany 3Munich Center for Machine Learning (MCML), Germany |
| Pseudocode | Yes | Algorithm 1 Generation of the attention mask for PMSA |
| Open Source Code | Yes | The project link and code are available at https://github.com/ai-med/Stable Pose. |
| Open Datasets | Yes | We assessed the performance of the proposed Stable-Pose as well as competing methods on five large-scale human-centric datasets including Human-Art [15], LAION-Human [16], UBC Fashion [43], Dance Track [40], and DAVIS [27] dataset. |
| Dataset Splits | Yes | On the Human-Art dataset, we trained all techniques, including ours for 10 epochs to ensure a fair comparison. On the LAIONHuman subset, we trained Stable-Pose, Human SD [16], GLIGEN [18] and Uni-Control Net [46] for 10 epochs, while we used released checkpoints from other techniques due to computational limitations. ... Human-Art: ... We adopt the same train-validation split as the authors suggested. ... LAION-Human: ... We randomly selected a subset of 200,000 images for training and 20,000 images for validation. |
| Hardware Specification | Yes | The training was executed using two NVIDIA A100 GPUs... Training was conducted on two NVIDIA A100 GPUs. |
| Software Dependencies | No | Similar to previous work [44; 25; 46], we fine-tuned our model on SD with version 1.5. We utilized Adam [17] optimizer with a learning rate of 1 10 5. |
| Experiment Setup | Yes | We utilized Adam [17] optimizer with a learning rate of 1 10 5. For our proposed PMSA Vi T module, we adopted a depth of 2 and a patch size of 2, where coarse-to-fine pose masks were generated using two Gaussian filters, each with a sigma value of 3 but with differing kernel sizes of 23 and 13, respectively. ... In the pose-mask guided loss function, we set an α of 5 as the guidance factor. We also followed [44] to randomly replace text prompts as empty strings at a probability of 0.5... |