InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image

Authors: Jianhui Li, Shilong Liu, Zidong Liu, Yikai Wang, Kaiwen Zheng, Jinghui Xu, Jianmin Li, Jun Zhu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify the effectiveness of our method and show its superiority against strong baselines quantitatively and qualitatively. Source code and pretrained models can be found on our project page: https://mybabyyh.github.io/Instruct Pix2Ne RF.
Researcher Affiliation Collaboration Jianhui Li1, Shilong Liu1, Zidong Liu1, Yikai Wang1, Kaiwen Zheng1, Jinghui Xu2 Jianmin Li1 , Jun Zhu1,2 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University, Beijing, 100084 China 2Shengshu Technology, Beijing
Pseudocode No The paper describes the architecture and method steps using text and diagrams but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Source code and pretrained models can be found on our project page: https://mybabyyh.github.io/Instruct Pix2Ne RF. To facilitate progress in the field, we will be completely open-sourcing the model, training code, and the data we have curated.
Open Datasets Yes We train our conditional diffusion model on the dataset we prepared from FFHQ (Karras et al., 2019) and use Celeb A-HQ (Karras et al., 2018) for evaluation.
Dataset Splits No The paper defines a test set ('The image test dataset is the first 300 images from Celeb A-HQ (Karras et al., 2018)') but does not explicitly specify comprehensive training, validation, and test splits with percentages or counts for the FFHQ dataset used for training, nor does it specify a distinct validation set beyond the test set used for evaluation.
Hardware Specification Yes We set tth = 600, λid = 0.1 and trained the model on a 4-card NVIDIA Ge Force RTX 3090 for 6 days with a batch size of 20 on a single card.
Software Dependencies No The paper mentions using pretrained models (e.g., EG3D, PREIM3D, CLIP) and various architectures (Diffusion Transformer, UNet, transformers), but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We set tth = 600, λid = 0.1 and trained the model on a 4-card NVIDIA Ge Force RTX 3090 for 6 days with a batch size of 20 on a single card. In our paper, we set p1 = 0.05, p2 = 0.05 as hyperparameters.