reproducibilityindex.ai

OneActor: Consistent Subject Generation via Cluster-Conditioned Guidance

Authors: Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang, Mengmeng Wang, Tieliang Gong, Guang Dai, Hao Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments show that our method outperforms a variety of baselines with satisfactory subject consistency, superior prompt conformity as well as high image quality.
Researcher Affiliation	Collaboration	1School of Computer Science and Technology, MOEKLINNS, Xi an Jiaotong University 2State Key Laboratory of Communication Content Cognition 3College of Computer Science and Technology, Zhejiang University of Technology 4SGIT AI Lab, State Grid Corporation of China 5China Telecom Artificial Intelligence Technology Co.Ltd
Pseudocode	Yes	G Pseudo-Code Algorithm 1: Tuning Process of One Actor
Open Source Code	Yes	We submitted codes in the supplementary materials. We will release the codes officially online with detailed instructions after preparations.
Open Datasets	No	The paper trains on data generated from a pre-trained diffusion model and augmented target/auxiliary samples, rather than a publicly available dataset with specific access information.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. The tuning process uses generated data and a convergence criterion.
Hardware Specification	Yes	All experiments are finished on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper lists various software implementations and models used (e.g., SDXL, Dream Booth-Lo RA, CLIP), but does not explicitly provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	All images are generated in 30 steps. During tuning, we generate N = 11 base images. We use K = 3 auxiliary images each batch. The projector consists of a 5-layer Res Net, 1 linear layer and 1 Ada IN layer. The weight hyper-parameters λ1 and λ2 are set to 0.5 and 0.2. We use the default Adam W optimizer with a learning rate of 0.0001, weight decay of 0.01. We tune with a convergence criterion, which takes 3-6 minutes in most cases. During inference, we set the semantic interpolation scale v to 0.8 if not specified. We set the cluster guidance scale η1 and η2 to 8.5 and 1.0. We apply cluster guidance to the first 20 inference steps and normal guidance to the last 10 steps.