Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
Authors: Ruichuan An, Sihan Yang, Renrui Zhang, zijun shen, Ming Lu, Gaole Dai, Hao Liang, Ziyu Guo, Shilin Yan, Yulin Luo, Bocheng Zou, Chaoqun Yang, Wentao Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Unify Bench indicate that Uni CTokens shows competitive performance compared to leading methods in concept understanding, concept generation, and achieving state-of-the-art results in personalized attribute-reasoning generation. |
| Researcher Affiliation | Collaboration | Ruichuan An1 2 Sihan Yang2 Renrui Zhang3 Zijun Shen4 Ming Lu5 Gaole Dai1 Hao Liang1 Ziyu Guo3 Shilin Yan1 Yulin Luo1 Bocheng Zou6 Chaoqun Yang7 Wentao Zhang1 1 Peking University 2 Xi an Jiao Tong University 3 CUHK 4 Intel Labs, China 5 Nanjing University 6 University of Wisconsin-Madison 7 Tsinghua University |
| Pseudocode | No | The paper describes the methodology in narrative text and mathematical formulas without presenting a distinct pseudocode or algorithm block. |
| Open Source Code | No | Our code and dataset will be released at: https://github.com/arctanxarc/Uni CTokens. |
| Open Datasets | Yes | Our code and dataset will be released at: https://github.com/arctanxarc/Uni CTokens. The dataset s data sources comprise animals and objects obtained from MC-LLa VA [24], Yo LLa VA [36] and My VLM [23] |
| Dataset Splits | No | Each concept is associated with 10~15 images for training and testing. |
| Hardware Specification | Yes | All experiments are conducted on A800 GPUs. |
| Software Dependencies | No | All training stages are optimized using Adam W. This is not specific enough to determine software dependencies like library versions. |
| Experiment Setup | Yes | We set the number of learnable tokens as K = 16, M = 8, and N = 8 respectively. All training stages are optimized using Adam W and each stage is trained for 20 epochs. The batch size is set to 4 for understanding tasks in stage 1, and 1 for both stage 2 and stage 3, as well as for T2I generation. |