Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
Authors: Yihong Luo, Tianyang Hu, Weijian Luo, Kenji Kawaguchi, Jing Tang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on text-to-image generation have demonstrated that Reward-Instruct achieves state-of-the-art results in visual quality and quantitative metrics compared to distillation-reliant methods, while also exhibiting greater robustness to the choice of reward function. |
| Researcher Affiliation | Collaboration | Yihong Luo1 , Tianyang Hu2 , Weijian Luo3, Kenji Kawaguchi4, Jing Tang5,1 1HKUST 2CUHK(SZ) 3Xiaohongshu Inc 4NUS 5HKUST(GZ) |
| Pseudocode | Yes | Algorithm 1 Reward-Instruct Listing 1: Torch-style pseudo code of SGLD step with temperature τ. |
| Open Source Code | No | Furthermore, we will release codes if got accepted. |
| Open Datasets | Yes | Training is performed on the Journey DB dataset [37] using prompts, without requiring images, as our method is image-free. ... Additionally, we use zero-shot FID on the COCO-5k dataset for a more comprehensive evaluation. |
| Dataset Splits | No | Training is performed on the Journey DB dataset [37] using prompts, without requiring images, as our method is image-free. ... Additionally, we use zero-shot FID on the COCO-5k dataset for a more comprehensive evaluation. |
| Hardware Specification | Yes | The training cost is measured by GPU hours on RTX-4090. |
| Software Dependencies | No | The paper mentions using Adam W optimizer with specific betas and learning rate, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We adopt the Adam W optimizer with β1 = 0.9, β2 = 0.95, and the learning rate of 2e 5. We use a batch size of 256. ... Require: Generator fθ, Pre-trained score fψ, Reward models {ri}, desired sampling steps K, total iterations N, learning rate λ. |