Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

Authors: Lev Bogolubsky, Pavel Dvurechenskii, Alexander Gasnikov, Gleb Gusev, Yurii Nesterov, Andrei M. Raigorodskii, Aleksey Tikhonov, Maksim Zhukovskii

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, we apply our algorithms to learning Supervised Page Rank on a real ranking task. Summing up, both two-level methods, unlike the state-of-the-art [21], have theoretical guarantees on convergence rate, and outperform it in the ranking quality in experiments. In this section, we compare our gradient-free and gradient-based methods with the state-of-the-art gradient-based method [21] on the web page ranking problem. In the next section, we describe the dataset. In Section 6.2, we report the results of the experiments.
Researcher Affiliation Collaboration Yandex1, Moscow State University2, Buryat State University8 {bogolubsky, gleb57, raigorodsky, altsoph, zhukmax}@yandex-team.ru Pavel Dvurechensky3,4, Alexander Gasnikov4,5 Weierstrass Institute3, Institute for Information Transmission Problems RAS4, Moscow Institute of Physics and Technology5 pavel.dvurechensky@wias-berlin.de, gasnikov@yandex.ru Yurii Nesterov6,7 Center for Operations Research and Econometrics6, Higher School of Economics7 yurii.nesterov@uclouvain.be
Pseudocode Yes Algorithm 1 Gradient-type method Input: Point x0 X, stepsize h > 0, number of steps M. Set k = 0. repeat Generate ξk and calculate corresponding gτ(xk, δ). Calculate xk+1 = ΠX(xk hgτ(xk, δ)) (ΠX( ) Euclidean projection onto the set X). Set k = k + 1. until k > M Output: The point y M = arg minx{f(x) : x {x0, . . . , x M}}. Algorithm 2 Adaptive projected gradient algorithm Input: Point x0 X, number L0 > 0. Set k = 0, z = + . repeat Set Mk = Lk, flag = 0. repeat Set δ = ε 16Mk . Calculate f(xk, δ) and g(xk, δ). Find wk = arg minx Q { g(xk, δ), x + Mk V (x, xk) + h(x)} and calculate f(wk, δ). If the inequality f(wk, δ) f(xk, δ) + g(xk, δ), wk xk + Mk 2 wk xk 2 + ε 8Mk holds, set flag = 1. Otherwise set Mk = 2Mk. until flag = 1 Set xk+1 = wk, Lk+1 = Mk 2 . If Mk(xk xk+1) < z, set z = Mk(xk xk+1) , K = k. Set k = k + 1. until z ε Output: The point x K+1.
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets No All experiments are performed with data of a popular commercial search engine Yandex2. We chose a random set of 600 queries Q and collected user sessions started with them.
Dataset Splits No We divide our data into two parts. On the first part Q1 (50% of the set of queries Q) we train the parameters and on the second part Q2 we test the algorithms. (No explicit mention of a separate validation set, only training and testing splits are described).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers.
Experiment Setup Yes The values of hyperparameters are the following: the Lipschitz constant L = 10 4 in GFN (and L0 = 10 4 in GBN), the accuracy ε = 10 6 (in both GBN and GFN), the radius R = 0.99 (in both GBN and GFN). We choose L in GFN to be equal to L0 (we show how the choice of L influences the output of the gradient-free algorithm, see supplementary materials, Figure 2). Moreover, we evaluate both our gradient-based and gradient-free algorithms for different values of the accuracies.