Optical Diffusion Models for Image Generation

Authors: Ilker Oguz, Niyazi Dinc, Mustafa Yildirim, Junjie Ke, Innfarn Yoo, Qifei Wang, Feng Yang, Christophe Moser, Demetri Psaltis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, we demonstrate that the propagation of a light beam through a semi-transparent medium can be programmed to implement a denoising diffusion model on image samples. ... The results in Figure 3 are reported with the beam propagation model (Eqn. 3) of the optical system designed to have 300 300 pixels per layer and four modulation layers. ... Furthermore, the evaluation of image generation quality metrics, Inception Score (IS) and Fréchet Inception Distance (FID), which are detailed in Appendix A.6, across different generation timesteps captures the improved realism of images with the optical diffusion procedure.
Researcher Affiliation Collaboration 1 École Polytechnique Fédérale de Lausanne 2 Google Research 3 Google
Pseudocode Yes Algorithm 1 Online Learning Algorithm
Open Source Code Yes The source code is available at https://ioguz.github.io/opticaldiffusion/.
Open Datasets Yes The results in Figure 3 are reported with the beam propagation model (Eqn. 3) of the optical system designed to have 300 300 pixels per layer and four modulation layers. ... for 3 classes of the MNIST digits [32], Fashion-MNIST [33] and the clock category of the Quick, Draw! datasets [34]. ... We utilized 3 different datasets to demonstrate the proposed approach. First 3 digits from MNIST-digits First 3 classes from Fashion-MNIST 20000 "Clock" images from Quick, Draw! dataset
Dataset Splits No The paper mentions training on specific datasets (MNIST, Fashion-MNIST, Quick Draw, AFHQ) and tracks training loss (MSE), but it does not provide explicit details on how the datasets were split into training, validation, and testing sets (e.g., percentages or sample counts).
Hardware Specification Yes After being downsampled to 20 20, the images are used for 250 epochs to train the ODUs, using Adam optimizer with a learning rate of 0.006, which took 10 hours on an A100 GPU. ... the digital benchmarks are run on an Nvidia L4 GPU, one of the state-of-the-art devices available today. ... the phase-only SLM (Meadowlark HSP 1920-500-1200) ... digital micromirror device (DMD) ... CMOS camera (FLIR BFS-U3-04S2M-CS). ... The DMD unit, Texas Instruments DLP9500, can display 23,148 patterns per second at a resolution of 1920 1080, with an electrical consumption of 4.5 W at board level, including data transfer (i.e. looping back from the detector).
Software Dependencies No The paper mentions that the model is developed in 'Py Torch environment' and uses 'Adam optimizer', but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Following the same experimental settings with the initial DDPM study [2], we set T = 1000 and β values to be in the linear range between β1 = 10 4 to βT = 0.02. ... designed to have 300 300 pixels per layer and four modulation layers. The number of layer sets (M) is 10. ... After being downsampled to 20 20, the images are used for 250 epochs to train the ODUs, using Adam optimizer with a learning rate of 0.006, which took 10 hours on an A100 GPU.