DDMI: Domain-agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
Authors: Dogyun Park, Sihyeon Kim, Sojin Lee, Hyunwoo J. Kim
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across four modalities, e.g., 2D images, 3D shapes, Neural Radiance Fields, and videos, with seven benchmark datasets, demonstrate the versatility of DDMI and its superior performance compared to the existing INR generative models. |
| Researcher Affiliation | Academia | Dogyun Park, Sihyeon Kim, Sojin Lee, Hyunwoo J. Kim Department of Computer Science Korea University Seoul, South Korea {gg933,sh bs15,sojin lee,hyunwoojkim}@korea.ac.kr |
| Pseudocode | No | The paper describes the model architecture and training procedure in natural language and diagrams, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/mlvlab/DDMI. |
| Open Datasets | Yes | For images, we evaluate models on AFHQv2 Cat and Dog (Choi et al., 2020) and Celeb A-HQ dataset (Karras et al., 2018) with a resolution of 2562... For shapes, we adopt the Shape Net dataset (Chang et al., 2015)... Next, we provide text-guided shape generation results on Text2Shape (T2S) dataset (Chen et al., 2019)... For videos, we use the Sky Timelapse dataset (Xiong et al., 2018)... Additionally, we provide additional results on CIFAR10 (Krizhevsky et al., 2009) and Lsun Churches (Yu et al., 2015)... Specifically, we train DDMI with SRN Cars dataset (Sitzmann et al., 2019). |
| Dataset Splits | No | The paper uses various benchmark datasets like Celeb A-HQ, AFHQv2, ShapeNet, Text2Shape, Sky Timelapse, CIFAR10, Lsun Churches, and SRN Cars, and implies using standard setups. However, it does not explicitly provide specific train/validation/test split percentages, sample counts, or detailed splitting methodologies within the main text or supplemental material described to be part of the main paper for reproducibility. |
| Hardware Specification | Yes | In this work, all experiments are conducted on 8 NVIDIA RTX3090 and 8 V100 GPUs. |
| Software Dependencies | No | The paper mentions various software components and models like U-Net encoder, Conv-ONET, Timesformer, Point Net, Adam W optimizer, CLIP text encoder, and Inception v3. However, it does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The comprehensive lists of hyperparameters used in experiments are provided in Tab. 9. We provide the hyperparameters of training in Tab. 10. |