reproducibilityindex.ai

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling

Authors: Jingyang Zhang, Shiwei Li, Yuanxun Lu, Tian Fang, David Neil McKinnon, Yanghai Tsin, Long Quan, Yao Yao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS", "Quantitatively, we compare the Fr echet inception distance (FID) (Heusel et al., 2017), inception score (IS) (Salimans et al., 2016) and CLIP similarity (Radford et al., 2021) of the generated RGB over a prompt collection of size 30K sampled from the MSCOCO (Lin et al., 2014) validation set. The results are listed in Tab. 1.
Researcher Affiliation	Collaboration	1Apple, 2The Hong Kong University of Science and Technology, 3Nanjing University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets	Yes	We perform training on the COYO-700M dataset (Byeon et al., 2022) containing image-caption pairs as well as various metadata including properties like image size and derived evaluations such as CLIP (Radford et al., 2021) similarity, watermark score and aesthetic score (Schuhmann, 2022).
Dataset Splits	Yes	Quantitatively, we compare the Fr echet inception distance (FID) (Heusel et al., 2017), inception score (IS) (Salimans et al., 2016) and CLIP similarity (Radford et al., 2021) of the generated RGB over a prompt collection of size 30K sampled from the MSCOCO (Lin et al., 2014) validation set." and "We prepare 100 text prompts sampled from COCO validation set and let user choose the overall best generated RGBD image...
Hardware Specification	Yes	We train the network on 64 NVidia A100 80G GPUs for around 24 hours.
Software Dependencies	No	The paper mentions specific models and tools used (e.g., Mi Da S v2, Omnidata, Stable Diffusion, Deepfloyd-IF) but does not provide version numbers for underlying software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The sample resolution is 512x512 and the batch size is 4 on each GPU. The model is trained with a learning rate of 1e-4 for 10000 steps with 1000 warmup steps. We adopt a probability of 15% to drop the text conditioning (Ho & Salimans, 2022) and apply noise offset (Lin et al., 2023) of 0.05. The original parameters in the RGB branch are frozen.