Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Function Basis Encoding of Numerical Features in Factorization Machines
Authors: Alex Shtoff, Elie Abboud, Rotem Stram, Oren Somekh
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we back our claims with a set of experiments that include a synthetic experiment, performance evaluation on several data-sets, and an A/B test on a real online advertising system which shows improved performance. |
| Researcher Affiliation | Collaboration | Alex Shtoff EMAIL Yahoo Elie Abboud EMAIL Department of Computer Science University of Haifa Rotem Stram EMAIL Yahoo Oren Somekh EMAIL Yahoo |
| Pseudocode | No | The paper describes methods through textual explanations, mathematical formulations, and a computational graph in Figure 2, but does not present a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | We have made the code to reproduce the experiments available at https://github.com/alexshtf/cont_features_paper. |
| Open Datasets | Yes | We mainly test our approach versus binning on several data-sets with abundant numerical features that have a strong predictive power: the California housing (Pace & Barry, 1997) , adult income (Kohavi, 1996), Higgs (Baldi et al., 2014) (we use the 98K version from Open ML (Vanschoren et al., 2014)), and song year prediction (Bertin-Mahieux et al., 2011). For the first two data-sets we used an FFM, whereas for the last two we used an FM, both provided by Yahoo (Yahoo-Inc, 2023)... To further demonstrate the benefits of our approach over on a real-world recommendation dataset, we evaluate it with an Fw FM on the Criteo display advertising challenge dataset (Criteo, 2014). |
| Dataset Splits | Yes | In addition, 20% of the data was held out for validation, and regression targets were standardized. ...the first 5/7 of the data-set is the training set, the next 1/7 is the validation set for hyper-parameter tuning, and the last 1/7 is the test set. |
| Hardware Specification | No | The paper discusses model training, datasets, and evaluation metrics but does not provide specific details regarding the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions several software tools and libraries like Sci Py, Optuna, and the Adam W optimizer, but it does not specify their version numbers, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | For tuning the step-size, batch-size, the number of intervals, and the embedding dimension we use Optuna (Akiba et al., 2019). For binning, we also tuned the choice of uniform or quantile bins. ...We conduct experiments with k 8, 16, . . . , 64 as embedding dimensions, and each experiment is conducted using 50 trials of Optuna (Akiba et al., 2019) with its default configuration to tune the learning rate and the L2 regularization coefficient. The models were trained using the Adam W optimizer (Loshchilov & Hutter, 2019). |