Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Halton Scheduler for Masked Generative Image Transformer

Authors: Victor Besnier, Mickael Chen, David Hurych, Eduardo Valle, MATTHIEU CORD

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation of both class-to-image synthesis on Image Net and text-to-image generation on the COCO dataset demonstrates that the Halton scheduler outperforms the Confidence scheduler quantitatively by reducing the FID and qualitatively by generating more diverse and more detailed images. Our code is at https://github.com/valeoai/Halton-MaskGIT. This section presents a comprehensive evaluation of our method, focusing on the enhancements brought by our Halton scheduler in image quality and diversity compared to the baseline Confidence scheduler. We present qualitative and quantitative results on two distinct tasks, each using different modalities: class-to-image (subsection 4.1) and text-to-image (subsection 4.2).
Researcher Affiliation Collaboration Victor Besnier1 Mickael Chen2, David Hurych1 Eduardo Valle2 Matthieu Cord2,3 1Valeo.ai, Prague 2Valeo.ai, Paris 3Sorbonne Université, Paris now at H company, Paris {firstname}.{lastname}@valeo.com
Pseudocode Yes 10 PSEUDO-CODE FOR HALTON SEQUENCE In algorithm 1, we detail the generation of the Halton sequence, producing a sequence of size n with a base b.
Open Source Code Yes Our code is at https://github.com/valeoai/Halton-MaskGIT.
Open Datasets Yes For our experiments in class-conditional image generation, we used the Image Net dataset (Deng et al., 2009)... For the text-to-image generation experiments, we employed a combination of real-world datasets, including CC12M (Changpinyo et al., 2021) and a subset of Segment Anything (Kirillov et al., 2023), as well as synthetic datasets such as Journey DB (Sun et al., 2024a) and Diffusion DB (Wang et al., 2022).
Dataset Splits No The paper mentions training on Image Net and evaluating on zero-shot COCO, but it does not specify explicit training/validation/test splits (e.g., percentages, exact counts, or specific predefined split names) for these or other datasets used (CC12M, Segment Anything, Journey DB, Diffusion DB) that would be needed to reproduce the data partitioning.
Hardware Specification No The paper mentions "Due to GPU memory constraints" and "on our GPU" in the training details, and acknowledges "Euro HPC Joint Undertaking for awarding us access to Karolina at IT4Innovations, Czech Republic." However, it does not provide specific model numbers for GPUs (e.g., NVIDIA A100), CPUs, or detailed specifications of the Karolina supercomputer's components used for the experiments.
Software Dependencies No The paper describes model architectures (e.g., ViT-XL, ViT-L, T5-XL encoder) and general optimization methods (Adam W, Cosine LR scheduler, bf16 precision) in Table 5 and 6. However, it does not provide specific version numbers for key software libraries or frameworks (e.g., PyTorch 1.x, TensorFlow 2.x, CUDA 11.x) that would be required for reproducibility.
Experiment Setup Yes Table 5 provides all the hyperparameters used to train our models across both modalities. Condition text-to-image class-to-image Training steps 5 x 10^5 2 x 10^6 Batch size 2048 256 Learning rate 5 x 10^-5 1 x 10^-4 Weight decay 0.05 5 x 10^-5 Optimizer Adam W Adam W Momentum β1 = 0.9, β2 = 0.999 β1 = 0.9, β2 = 0.96 Lr scheduler Cosine Cosine Warmup steps 2500 2500 Gradient clip norm 0.25 1 EMA 0.999 CFG dropout 0.1 0.1 Data aug. No Horizontal Flip Precision bf16 bf16