Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Function Basis Encoding of Numerical Features in Factorization Machines

Authors: Alex Shtoff, Elie Abboud, Rotem Stram, Oren Somekh

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we back our claims with a set of experiments that include a synthetic experiment, performance evaluation on several data-sets, and an A/B test on a real online advertising system which shows improved performance.
Researcher Affiliation	Collaboration	Alex Shtoff EMAIL Yahoo Elie Abboud EMAIL Department of Computer Science University of Haifa Rotem Stram EMAIL Yahoo Oren Somekh EMAIL Yahoo
Pseudocode	No	The paper describes methods through textual explanations, mathematical formulations, and a computational graph in Figure 2, but does not present a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	We have made the code to reproduce the experiments available at https://github.com/alexshtf/cont_features_paper.
Open Datasets	Yes	We mainly test our approach versus binning on several data-sets with abundant numerical features that have a strong predictive power: the California housing (Pace & Barry, 1997) , adult income (Kohavi, 1996), Higgs (Baldi et al., 2014) (we use the 98K version from Open ML (Vanschoren et al., 2014)), and song year prediction (Bertin-Mahieux et al., 2011). For the first two data-sets we used an FFM, whereas for the last two we used an FM, both provided by Yahoo (Yahoo-Inc, 2023)... To further demonstrate the benefits of our approach over on a real-world recommendation dataset, we evaluate it with an Fw FM on the Criteo display advertising challenge dataset (Criteo, 2014).
Dataset Splits	Yes	In addition, 20% of the data was held out for validation, and regression targets were standardized. ...the first 5/7 of the data-set is the training set, the next 1/7 is the validation set for hyper-parameter tuning, and the last 1/7 is the test set.
Hardware Specification	No	The paper discusses model training, datasets, and evaluation metrics but does not provide specific details regarding the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions several software tools and libraries like Sci Py, Optuna, and the Adam W optimizer, but it does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	For tuning the step-size, batch-size, the number of intervals, and the embedding dimension we use Optuna (Akiba et al., 2019). For binning, we also tuned the choice of uniform or quantile bins. ...We conduct experiments with k 8, 16, . . . , 64 as embedding dimensions, and each experiment is conducted using 50 trials of Optuna (Akiba et al., 2019) with its default configuration to tune the learning rate and the L2 regularization coefficient. The models were trained using the Adam W optimizer (Loshchilov & Hutter, 2019).