Exponential Machines

Authors: Alexander Novikov, Mikhail Trofimov, Ivan Oseledets

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset Movie Lens 100K.
Researcher Affiliation Academia 1National Research University Higher School of Economics, Moscow, Russia 2Institute of Numerical Mathematics, Moscow, Russia 3Moscow Institute of Physics and Technology, Moscow, Russia 4Skolkovo Institute of Science and Technology, Moscow, Russia
Pseudocode Yes Algorithm 1 Riemannian optimization
Open Source Code Yes We release a Python implementation of the proposed algorithm and the code to reproduce the experiments1. 1https://github.com/Bihaqo/exp-machines
Open Datasets Yes UCI (Lichman, 2013) Car dataset is a classification problem with 1728 objects and 21 binary features (after one-hot encoding). UCI HIV dataset is a binary classification problem with 1625 objects and 160 features. Movie Lens 100K is a recommender system dataset with 943 users and 1682 movies (Harper & Konstan, 2015).
Dataset Splits Yes UCI Car dataset... We randomly splitted the data into 1382 training and 346 test objects. UCI HIV dataset... which we randomly splitted into 1300 training and 325 test objects. Synthetic data. We generated 100 000 train and 100 000 test objects. Movie Lens 100K... This results in 21200 positive samples, half of which were used for traininig (with equal amount of sampled negative examples) and the rest were used for testing.
Hardware Specification Yes Our model obtained 0.784 test AUC with the TT-rank equal 10 in 273 seconds on a Tesla K40 GPU
Software Dependencies No The paper mentions using Python, TT-Toolbox, Tensor Flow, scikit-learn, Fast FM, and Adam optimizer, but does not provide specific version numbers for any of these software components.
Experiment Setup Yes In this and later experiments we tuned the learning rate of both Riemannian and SGD optimizers with respect to the training loss after 100 iterations by the grid search with logarithmic grid. On the Car and HIV datasets we turned off the regularization (λ = 0) and used rank r = 4. Riemannian optimization (learning rate α = 40) ... baseline (learning rate α = 0.03). Riemannian optimization (learning rate α = 800) ... baseline (learning rate α = 0.001). 6-th order FM with the Adam optimizer... best rank (20) and learning rate (0.003).