Novel Object Synthesis via Adaptive Text-Image Harmony
Authors: Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China 2RIKEN, 3College of Computer Science, Nankai University, Tianjing, 300350, China 4College of Automation Science and Engineering, South China University of Technology, Guangzhou, 510640, China |
| Pseudocode | Yes | Algorithm 1 Novel Object Synthesis |
| Open Source Code | No | for data we used in paper, we introduced in Appendix refsec:TICategories, and we will provide source code when this paper is published. |
| Open Datasets | Yes | Images, selected from various classes in PIE-bench [26], include 20 animal and 10 non-animal categories. Texts were chosen from the 1,000 classes in Image Net [53] |
| Dataset Splits | No | No specific training, validation, or test dataset split percentages or counts are provided for the constructed dataset. The paper states the overall dataset size but not its partitioning for different phases. |
| Hardware Specification | Yes | Our experiments were conducted using two NVIDIA Ge Force RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions 'SDXLturbo [56]' as the base model but does not specify any other software dependencies with version numbers (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | For image editing, we set the source prompt ps as an empty string 'Null' and the target prompt Pt as the target object class name. During sampling, we used the Ancestral-Euler sampler [28] with four denoising steps. All input images were uniformly scaled to 512 512 pixels to ensure consistent resolution in all the experiments. ...Ultimately, we set λ to 125. ...we set the value of k to 2.3. ...we set the minimum similarity threshold Imin sim to 0.45 and the maximum similarity threshold Imax sim to 0.85. |