Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online Portfolio Selection with ML Predictions

Authors: Ziliang Zhang, Tianming Zhao, Albert Zomaya

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on large-scale equity data strengthen our theory, spanning both synthetic prediction streams and production-grade machine-learning models. 3 Empirical study We begin by assessing RAM on the canonical New York Stock Exchange (NYSE) benchmark, using i.i.d. random rankings to model a fully oblivious and uninformative oracle.
Researcher Affiliation	Academia	Ziliang Zhang School of Computer Science The University of Sydney Camperdown NSW 2050, Australia EMAIL Tianming Zhao School of Computer Science The University of Sydney Camperdown NSW 2050, Australia EMAIL Albert Y. Zomaya School of Computer Science The University of Sydney Camperdown NSW 2050, Australia EMAIL
Pseudocode	Yes	Algorithm 1 RAM: rebalanced arithmetic mean with predictions
Open Source Code	Yes	Source code is available at https://github.com/mroymd/OPML. Both the NYSE and S&P 500 datasets are publicly available; moreover, we supply a one-click Colab notebook that fully replicates all reported experiments.
Open Datasets	Yes	Our primary dataset is the original NYSE(O) collection [27], which contains 36 stocks spanning 22 years (1962 1984) over 5,651 trading days. To capture a broader range of market volatility and ensure more recent coverage, we also consider the extended NYSE(N) dataset [28], encompassing 21 assets from 1962 to 2006 (11,178 trading days). We use the nightly-refreshed S&P 500 historical panel [30] available on Kaggle, containing 501 constituents from 2010 to 2024.
Dataset Splits	Yes	The model is retrained each trading day using a 250-day sliding window, featuring contemporaneous and three lagged returns per asset. A decaying factor with Θ = 0.995age prioritizes recent observations while discarding stale information. A 60-day hold-out slice inside the same window provides early-stopping signals, eliminating look-ahead bias.
Hardware Specification	Yes	All experiments compute under 6h on one standard CPU.
Software Dependencies	No	The paper mentions 'Light GBM Lambda MART [25]' for forecasting ranks, but does not provide a specific version number for this software or any other key software components used in the experiments.
Experiment Setup	Yes	The model is retrained each trading day using a 250-day sliding window, featuring contemporaneous and three lagged returns per asset. A decaying factor with Θ = 0.995age prioritizes recent observations while discarding stale information. A 60-day hold-out slice inside the same window provides early-stopping signals, eliminating look-ahead bias.