Novel Object Synthesis via Adaptive Text-Image Harmony

Authors: Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations
Researcher Affiliation Academia 1School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China 2RIKEN, 3College of Computer Science, Nankai University, Tianjing, 300350, China 4College of Automation Science and Engineering, South China University of Technology, Guangzhou, 510640, China
Pseudocode Yes Algorithm 1 Novel Object Synthesis
Open Source Code No for data we used in paper, we introduced in Appendix refsec:TICategories, and we will provide source code when this paper is published.
Open Datasets Yes Images, selected from various classes in PIE-bench [26], include 20 animal and 10 non-animal categories. Texts were chosen from the 1,000 classes in Image Net [53]
Dataset Splits No No specific training, validation, or test dataset split percentages or counts are provided for the constructed dataset. The paper states the overall dataset size but not its partitioning for different phases.
Hardware Specification Yes Our experiments were conducted using two NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies No The paper mentions 'SDXLturbo [56]' as the base model but does not specify any other software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes For image editing, we set the source prompt ps as an empty string 'Null' and the target prompt Pt as the target object class name. During sampling, we used the Ancestral-Euler sampler [28] with four denoising steps. All input images were uniformly scaled to 512 512 pixels to ensure consistent resolution in all the experiments. ...Ultimately, we set λ to 125. ...we set the value of k to 2.3. ...we set the minimum similarity threshold Imin sim to 0.45 and the maximum similarity threshold Imax sim to 0.85.