Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods
Authors: Lev Bogolubsky, Pavel Dvurechenskii, Alexander Gasnikov, Gleb Gusev, Yurii Nesterov, Andrei M. Raigorodskii, Aleksey Tikhonov, Maksim Zhukovskii
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we apply our algorithms to learning Supervised Page Rank on a real ranking task. Summing up, both two-level methods, unlike the state-of-the-art [21], have theoretical guarantees on convergence rate, and outperform it in the ranking quality in experiments. In this section, we compare our gradient-free and gradient-based methods with the state-of-the-art gradient-based method [21] on the web page ranking problem. In the next section, we describe the dataset. In Section 6.2, we report the results of the experiments. |
| Researcher Affiliation | Collaboration | Yandex1, Moscow State University2, Buryat State University8 {bogolubsky, gleb57, raigorodsky, altsoph, zhukmax}@yandex-team.ru Pavel Dvurechensky3,4, Alexander Gasnikov4,5 Weierstrass Institute3, Institute for Information Transmission Problems RAS4, Moscow Institute of Physics and Technology5 pavel.dvurechensky@wias-berlin.de, gasnikov@yandex.ru Yurii Nesterov6,7 Center for Operations Research and Econometrics6, Higher School of Economics7 yurii.nesterov@uclouvain.be |
| Pseudocode | Yes | Algorithm 1 Gradient-type method Input: Point x0 X, stepsize h > 0, number of steps M. Set k = 0. repeat Generate ξk and calculate corresponding gτ(xk, δ). Calculate xk+1 = ΠX(xk hgτ(xk, δ)) (ΠX( ) Euclidean projection onto the set X). Set k = k + 1. until k > M Output: The point y M = arg minx{f(x) : x {x0, . . . , x M}}. Algorithm 2 Adaptive projected gradient algorithm Input: Point x0 X, number L0 > 0. Set k = 0, z = + . repeat Set Mk = Lk, flag = 0. repeat Set δ = ε 16Mk . Calculate f(xk, δ) and g(xk, δ). Find wk = arg minx Q { g(xk, δ), x + Mk V (x, xk) + h(x)} and calculate f(wk, δ). If the inequality f(wk, δ) f(xk, δ) + g(xk, δ), wk xk + Mk 2 wk xk 2 + ε 8Mk holds, set flag = 1. Otherwise set Mk = 2Mk. until flag = 1 Set xk+1 = wk, Lk+1 = Mk 2 . If Mk(xk xk+1) < z, set z = Mk(xk xk+1) , K = k. Set k = k + 1. until z ε Output: The point x K+1. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository. |
| Open Datasets | No | All experiments are performed with data of a popular commercial search engine Yandex2. We chose a random set of 600 queries Q and collected user sessions started with them. |
| Dataset Splits | No | We divide our data into two parts. On the first part Q1 (50% of the set of queries Q) we train the parameters and on the second part Q2 we test the algorithms. (No explicit mention of a separate validation set, only training and testing splits are described). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | The values of hyperparameters are the following: the Lipschitz constant L = 10 4 in GFN (and L0 = 10 4 in GBN), the accuracy ε = 10 6 (in both GBN and GFN), the radius R = 0.99 (in both GBN and GFN). We choose L in GFN to be equal to L0 (we show how the choice of L influences the output of the gradient-free algorithm, see supplementary materials, Figure 2). Moreover, we evaluate both our gradient-based and gradient-free algorithms for different values of the accuracies. |