Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

Authors: Yuhao Yang, ZhI JI, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, shuanglong li, LIU LIN

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on public datasets and offline tests validate our method s robustness. Online A/B tests on a real-world advertising platform with over 200 million daily users demonstrate substantial improvements in key metrics, highlighting COBRA s practical advantages.
Researcher Affiliation	Industry	Yuhao Yang, Zhi Ji , Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, Lin Liu Baidu Inc., Beijing, China EMAIL
Pseudocode	Yes	For a detailed algorithmic description, please refer to the pseudocode provided in Appendix E.
Open Source Code	No	Due to the submission requirements of the commercial company, we are unable to include the code with our submission. However, we provide complete and detailed parameter settings and pseudocode for use by other researchers.
Open Datasets	Yes	In our experiments, we evaluate the performance of COBRA using the Amazon Product Reviews dataset [35, 36].
Dataset Splits	Yes	For evaluation, we adopted the widely-used leave-one-out strategy: the last item in each user s sequence served as the test sample, the second-to-last as the validation sample, and the remaining items as training data. The dataset is divided into two parts: the training set Dtrain and the test set Dtest. The training set consists of user interaction logs collected over the first 60 days, covering recommendation content and user behaviors during this period. The test set is constructed from logs recorded on the day immediately following the training period and serves as a benchmark for model performance evaluation.
Hardware Specification	No	The focus of this study is on theoretical innovation and methodological exploration of the algorithm, rather than specific engineering implementation and resource optimization. We are primarily concerned with the structural design of the model and the logical flow of the algorithm. At this stage, we believe that the innovativeness and effectiveness of the algorithm are the more critical factors to consider.
Software Dependencies	No	In our approach, we adopt a method for generating semantic IDs similar to the one used in [19]. However, unlike [19], which uses a different configuration, we employ a 3-level semantic ID structure, where each level corresponds to a codebook size of 32. These semantic IDs are generated using the T5 model. COBRA is implemented with a lightweight architecture, featuring a 1-layer encoder and a 2-layer decoder.
Experiment Setup	Yes	COBRA is implemented with a lightweight architecture, featuring a 1-layer encoder and a 2-layer decoder. COBRA achieves an optimal balance between recall and diversity at τ = 0.9.