Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation Models

Authors: Tingjia Shen, Hao Wang, Chuhan Wu, Jin Yao Chin, Wei Guo, Yong Liu, Huifeng Guo, Defu Lian, Ruiming Tang, Enhong Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiment on various datasets demonstrates the effectiveness of Performance Law by displaying exceptional quantitative prediction ability against the original and modified qualitative SL. Additional application experiments on optimal parameter prediction and model expansion potential prediction also demonstrated the broad applicability of the Performance Law.
Researcher Affiliation	Collaboration	1State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2Shenzhen Huawei Technologies Co.Ltd.
Pseudocode	No	The paper describes methods and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source Code: https://github.com/USTC-StarTeam/P-Law
Open Datasets	Yes	To demonstrate the performance of our proposed approach across various kinds of datasets, we conducted experiments on three publicly available datasets: Movie Lens-1M (51) (ML1M), Amazon Books (52) (AMZ-Books), Kuai Rand-Pure (53) (KR-Pure) and one private dataset Industrial.
Dataset Splits	Yes	We adopt the leave-one-out strategy for evaluation, following prior research (55; 56; 57). For each sequence, the most recent interaction is used for testing, the second for validation, and the rest for training.
Hardware Specification	No	We utilized 48 industrial GPUs to run this experiment, with the largest experiment taking 24 hours. This truly allowed us to study model performance at extreme data and model scales.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers for the authors' implementation.
Experiment Setup	Yes	Regarding model configurations, for the Movie Lens-1M, Kuai Rand-pure, and Amazon Books datasets, we configured N {4, 8, 12, 16, 24, 32} and demb {25, 50, 75, 100}. For the Industrial dataset, we set N {8, 16, 32, 64} and demb {128, 256, 512, 1024}. From a data perspective, we selected the maximum sequence length for truncation based on the average length of each dataset. In the Movie Lens-1M dataset, we selected according to the maximum sequence length Smax {100, 150, 200}. In the Kuai Rand-Pure dataset, we set the maximum sequence length Smax {25, 50, 100}. Finally, for the Amazon Books and Industrial datasets, we configured the maximum sequence length Smax {25, 50}.