Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Authors: Ruichuan An, Sihan Yang, Renrui Zhang, zijun shen, Ming Lu, Gaole Dai, Hao Liang, Ziyu Guo, Shilin Yan, Yulin Luo, Bocheng Zou, Chaoqun Yang, Wentao Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on Unify Bench indicate that Uni CTokens shows competitive performance compared to leading methods in concept understanding, concept generation, and achieving state-of-the-art results in personalized attribute-reasoning generation.
Researcher Affiliation Collaboration Ruichuan An1 2 Sihan Yang2 Renrui Zhang3 Zijun Shen4 Ming Lu5 Gaole Dai1 Hao Liang1 Ziyu Guo3 Shilin Yan1 Yulin Luo1 Bocheng Zou6 Chaoqun Yang7 Wentao Zhang1 1 Peking University 2 Xi an Jiao Tong University 3 CUHK 4 Intel Labs, China 5 Nanjing University 6 University of Wisconsin-Madison 7 Tsinghua University
Pseudocode No The paper describes the methodology in narrative text and mathematical formulas without presenting a distinct pseudocode or algorithm block.
Open Source Code No Our code and dataset will be released at: https://github.com/arctanxarc/Uni CTokens.
Open Datasets Yes Our code and dataset will be released at: https://github.com/arctanxarc/Uni CTokens. The dataset s data sources comprise animals and objects obtained from MC-LLa VA [24], Yo LLa VA [36] and My VLM [23]
Dataset Splits No Each concept is associated with 10~15 images for training and testing.
Hardware Specification Yes All experiments are conducted on A800 GPUs.
Software Dependencies No All training stages are optimized using Adam W. This is not specific enough to determine software dependencies like library versions.
Experiment Setup Yes We set the number of learnable tokens as K = 16, M = 8, and N = 8 respectively. All training stages are optimized using Adam W and each stage is trained for 20 epochs. The batch size is set to 4 for understanding tasks in stage 1, and 1 for both stage 2 and stage 3, as well as for T2I generation.