Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Curriculum Disentangled Recommendation with Noisy Multi-feedback
Authors: Hong Chen, Yudong Chen, Xin Wang, Ruobing Xie, Rui Wang, Feng Xia, Wenwu Zhu
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on several real-world datasets demonstrate that the proposed CDR model can significantly outperform several state-of-the-art methods in terms of recommendation accuracy3. |
| Researcher Affiliation | Collaboration | 1Tsinghua University, 2We Chat Search Application Department, Tencent. |
| Pseudocode | Yes | Algorithm 1 Adjustable Self-evaluating Curriculum towards A Better Self |
| Open Source Code | Yes | Our code will be released at https://github.com/forchchch/CDR |
| Open Datasets | Yes | We conduct our experiments on four real-world datasets: We Chat5D, Movie Lens-1M[44], Amazon Sports[45] and Amazon Beauty[45]. |
| Dataset Splits | Yes | The whole dataset is chronologically divided to the train, valid, and test dataset by the ratio of 8:1:1. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Tensorflow' but does not provide specific version numbers for it or any other software libraries or dependencies. |
| Experiment Setup | Yes | We implement our method in Tensorflow and use the Adagrad [51] optimizer for mini-batch gradient descent that is suitable for sparse data, while the size of each mini-batch is 256. All the mentioned transformer encoders are four-head and one-layer. We cap the maximum sequential historical behavior length to 30 for all datasets. We fix µ in the curriculum to 10 and the other hyper-parameters are then tuned using random search. The search space is listed as follows. The number of latent intentions K {1, 2, , 8}. The prior confidence for the unclicked data λ {0.1, 0.2, , 1.0}. The learning rate {0.0001, 0.001, 0.01, 0.1, 1.0}. The hidden size of each field of feature {32, 64, 128, 256}. |