Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Authors: Yiren Song, Cheng Liu, Mike Zheng Shou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments 4.1 Experiments Details Set up. We adopt Flux 1.0 dev [21] as the pre-trained model. The dataset resolution is 1024 × 1024, while condition images are downsampled to 512 × 512 to reduce memory and computation, with high-resolution control achieved via conditional token mapping. 4.3 Quantitative Evaluation As shown in Table 1, our method achieves the best performance across five style consistency metrics and ranks among the top in content consistency. It also obtains the highest CLIP Score, indicating superior text-image alignment. These results demonstrate that our consistency-aware framework effectively balances stylization fidelity, semantic preservation, and prompt alignment. 4.5 Ablation Study Ablation Study. We conduct ablation experiments on two key design choices: (1) rolling training with multiple style Lo RAs and (2) decoupled training of style and consistency. As shown in Fig. 5, when we remove rolling training and instead use a single Lo RA trained on mixed-style data, the generated results maintain reasonable content consistency, but show a significant degradation in stylization quality on unseen styles.
Researcher Affiliation	Academia	Yiren Song Cheng Liu Mike Zheng Shou Show Lab, National University of Singapore
Pseudocode	No	The paper describes the method and architecture through diagrams (e.g., Figure 2) and textual descriptions of the training stages and components, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	Code is released at https://github.com/showlab/Omni Consistency
Open Datasets	No	To effectively support model training, we meticulously constructed a high-quality, multi-source stylization dataset, covering 22 different styles and totaling 2,600 image pairs. Data sources include manually drawn illustrations and GPT-4o-guided [1] generation of highly consistent stylized images. After rigorous manual selection, we obtained a reliable paired dataset suitable for consistency model training. ...and will be publicly released to support future research in stylization and consistency modeling.
Dataset Splits	No	The paper constructs a dataset of 2,600 paired images for training and mentions a benchmark of 100 images for evaluation. However, it does not provide explicit training/test/validation splits (e.g., percentages or exact counts) for the 2,600-image dataset used for model training.
Hardware Specification	No	The training is conducted in two stages: the first stage fine-tunes the style Lo RA for 6,000 steps on a single GPU, using a learning rate of 1 × 10−4 and a batch size of 1. The second stage trains the consistency module from scratch for 9,000 steps on 4 GPUs, with a per-GPU batch size of 1 (total batch size = 4) and the same learning rate.
Software Dependencies	Yes	We adopt Flux 1.0 dev [21] as the pre-trained model.
Experiment Setup	Yes	The training is conducted in two stages: the first stage fine-tunes the style Lo RA for 6,000 steps on a single GPU, using a learning rate of 1 × 10−4 and a batch size of 1. The second stage trains the consistency module from scratch for 9,000 steps on 4 GPUs, with a per-GPU batch size of 1 (total batch size = 4) and the same learning rate. In this stage, every 50 steps, a style Lo RA and its corresponding data are loaded from the Lo RA bank to encourage multi-style generalization.