Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Personalized Visual Content Generation in Conversational Systems

Authors: Xianquan Wang, Zhaocheng Du, Huibo Xu, Shukang Yin, Yupeng Han, Jieming Zhu, Kai Zhang, Qi Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark conversational datasets including objective metrics and GPT-based evaluations demonstrate that our framework outperforms strong baselines, which highlight its potential to redefine personalization in visual content generation for conversational scenarios like e-commerce and real-world recommendation.
Researcher Affiliation	Collaboration	1University of Science and Technology of China 2Huawei Noah s Ark Lab
Pseudocode	Yes	D Pseudo Code
Open Source Code	Yes	The code is publicly available at https://github.com/xqwustc/PCG.
Open Datasets	Yes	Following previous works of conversational recommender systems [23, 40], we conduct experiments on two conversational recommendation datasets set in movie scenarios. The two datasets are classic benchmarks for movie conversational recommender systems, containing many high-quality interactions with the systems.
Dataset Splits	Yes	For both datasets, the original data was randomly split into training, validation, and test sets with a ratio of 8:1:1.
Hardware Specification	Yes	using a single A100-80G GPU with a batch size of 1
Software Dependencies	No	As mentioned in Section H, we use Qwen3-8B 4 as the LLM to generate user inclinations and GPT-4o for evaluation. When fine-tuning PCG Lo RA based on Easy Control, we strictly follow its recommended settings.
Experiment Setup	Yes	We generate outputs with the following parameters: a maximum of 128 new tokens, sampling enabled with a temperature of 0.7, top-p sampling with a probability of 0.8, top-k sampling with a limit of 20, and a minimum probability of 0.0. When fine-tuning PCG Lo RA based on Easy Control, we strictly follow its recommended settings. The overall learning rate is set to 1 10 4 (based on the FLUX.1-dev pre-trained model, using a single A100-80G GPU with a batch size of 1). The two types of Lo RAs in the two-stage training share the same learning rate. The optimizer used is Adam W with the parameters: β1 = 0.9, β2 = 0.999, weight decay = 1 10 4. The dimension of the low-rank matrices is set to 128.