Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation Models
Authors: Tingjia Shen, Hao Wang, Chuhan Wu, Jin Yao Chin, Wei Guo, Yong Liu, Huifeng Guo, Defu Lian, Ruiming Tang, Enhong Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiment on various datasets demonstrates the effectiveness of Performance Law by displaying exceptional quantitative prediction ability against the original and modified qualitative SL. Additional application experiments on optimal parameter prediction and model expansion potential prediction also demonstrated the broad applicability of the Performance Law. |
| Researcher Affiliation | Collaboration | 1State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2Shenzhen Huawei Technologies Co.Ltd. |
| Pseudocode | No | The paper describes methods and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source Code: https://github.com/USTC-StarTeam/P-Law |
| Open Datasets | Yes | To demonstrate the performance of our proposed approach across various kinds of datasets, we conducted experiments on three publicly available datasets: Movie Lens-1M (51) (ML1M), Amazon Books (52) (AMZ-Books), Kuai Rand-Pure (53) (KR-Pure) and one private dataset Industrial. |
| Dataset Splits | Yes | We adopt the leave-one-out strategy for evaluation, following prior research (55; 56; 57). For each sequence, the most recent interaction is used for testing, the second for validation, and the rest for training. |
| Hardware Specification | No | We utilized 48 industrial GPUs to run this experiment, with the largest experiment taking 24 hours. This truly allowed us to study model performance at extreme data and model scales. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers for the authors' implementation. |
| Experiment Setup | Yes | Regarding model configurations, for the Movie Lens-1M, Kuai Rand-pure, and Amazon Books datasets, we configured N {4, 8, 12, 16, 24, 32} and demb {25, 50, 75, 100}. For the Industrial dataset, we set N {8, 16, 32, 64} and demb {128, 256, 512, 1024}. From a data perspective, we selected the maximum sequence length for truncation based on the average length of each dataset. In the Movie Lens-1M dataset, we selected according to the maximum sequence length Smax {100, 150, 200}. In the Kuai Rand-Pure dataset, we set the maximum sequence length Smax {25, 50, 100}. Finally, for the Amazon Books and Industrial datasets, we configured the maximum sequence length Smax {25, 50}. |