Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
Authors: Yeongmin Kim, Kwanghyeon Lee, Minsang Park, Byeonghu Na, Il-chul Moon
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence demonstrates the effectiveness of the intended design in DBAE, which notably enhances downstream inference quality, reconstruction, and disentanglement. Additionally, DBAE generates high-fidelity samples in an unconditional generation. 5 EXPERIMENT This section empirically validates the effectiveness of the intended design of the proposed model, DBAE. |
| Researcher Affiliation | Collaboration | Yeongmin Kim1, Kwanghyeon Lee1, Minsang Park1, Byeonghu Na1, Il-Chul Moon1,2 1Korea Advanced Institute of Science and Technology (KAIST), 2summary.ai |
| Pseudocode | Yes | Algorithm 1: DBAE Training Algorithm for Reconstruction Algorithm 2: Reconstruction Algorithm 3: Latent DPM Training Algorithm Algorithm 4: Unconditional Generation Algorithm |
| Open Source Code | Yes | Our code is available at https://github.com/aailab-kaist/DBAE. |
| Open Datasets | Yes | We evaluate EncΟ(x0) trained on Celeb A (Liu et al., 2015) and FFHQ (Karras et al., 2019). We train a linear classifier on 1) Celeb A with 40 binary labels, measuring accuracy as AP, and 2) LFW (Kumar et al., 2009) for attribute regression...We trained DBAE on FFHQ and evaluated it on Celeb A-HQ (Karras et al., 2018). Figure 6 shows the interpolation results on the LSUN Horse, Bedroom (Yu et al., 2015) and FFHQ datasets. |
| Dataset Splits | Yes | We train a linear classifier with parameters (w, b) using data-attribute pairs (x0, y). We examine the Celeb A test dataset. Table 2 reports the averaged reconstruction error over the test dataset Eptest(x0)[d(x0, Λx0)]. We randomly selected 1000 samples from the Celeb A training, validation, and test sets to perform the measurement following (Yeats et al., 2022). For Table 4 we measure FID between 50k random samples from the FFHQ dataset and 50k randomly generated samples. |
| Hardware Specification | Yes | Table 7: Computational cost comparison for FFHQ128. Training time is measured in milliseconds per image per NVIDIA A100 (ms/img/A100), and testing time is reported in milliseconds per one sampling step per NVIDIA A100 (ms/one sampling step/A100). Table 15: Regenerated results of Table 2 across multiple hardwares. Hardware SSIM ( ) LPIPS ( ) MSE ( ) Nvidia A100 0.953 0.072 2.49e-3 Intel Gaudi v2 0.956 0.073 2.47e-3 We conducted evaluations across various infrastructures to assess experimental reproducibility. The performance of the trained model (DBAE-d) was evaluated on both the Nvidia A100 and Intel Gaudi v2 chips. |
| Software Dependencies | No | Optimizer RAdam Optimizer Adam W (weight decay = 0.01) The paper mentions specific optimizers, but does not provide version numbers for any key software components or programming languages used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | C.1 TRAINING CONFIGURATION Optimization We follow the optimization argument from DDBM (Zhou et al., 2024) with Variance Preserving (VP) SDE. We utilize the preconditioning and time-weighting proposed in DDBM, with the pred-x parameterization (Karras et al., 2022). Table 5 shows the remaining optimization hyperparameters. Table 5: Network architecture and training configuration of DBAE. Parameter Celeb A 64 FFHQ 128 Horse 128 Bedroom 128 Base channels 64 128 128 128 Channel multipliers [1,2,4,8] [1,1,2,3,4] [1,1,2,3,4] [1,1,2,3,4] Attention resolution [16] [16] [16] [16] Encoder base ch 64 128 128 128 Enc. attn. resolution [16] [16] [16] [16] Encoder ch. mult. [1,2,4,8,8] [1,1,2,3,4,4] [1,1,2,3,4,4] [1,1,2,3,4,4] latent variable z dimension 32, 256, 512 512 512 512 Vanilla forward SDE VP VP VP VP Images trained 72M, 130M 130M 130M 130M Batch size 128 128 128 128 Learning rate 1e-4 1e-4 1e-4 1e-4 Optimizer RAdam RAdam RAdam RAdam Weight decay 0.0 0.0 0.0 0.0 EMA rate 0.9999 0.9999 0.9999 0.9999 |