Online Learning to Rank with Features
Authors: Shuai Li, Tor Lattimore, Csaba Szepesvari
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run experiments to compare Recur Rank with Cascade Lin UCB (Li et al., 2016; Zong et al., 2016) and Top Rank (Lattimore et al., 2018). Synthetic experiments We construct environments using the cascade click model (CM) and the position-based click model (PBM) with L = 104 items in d = 5 dimension to be displayed in K = 10 positions. We first randomly draw item vectors L and weight vector θ in d 1 dimension with each entry a standard Gaussian variable, then normalise, add one more dimension with constant 1, and divide by 2. The evolution of the regret as a function of time is shown in Fig. 2(a)(b). |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong 2Deep Mind. Correspondence to: Shuai Li <shuaili@cse.cuhk.edu.hk>, Tor Lattimore <lattimore@google.com>, Csaba Szepesvári <szepi@google.com>. |
| Pseudocode | Yes | Algorithm 1 Recur Rank 1: Input: Phase number ℓand A = (a1, a2, . . .) and K = (k, k + 1, . . . , k + m 1) 2: Find a G-optimal design π = GOPT(A) 3: Let ℓ= 2 ℓand T(a) = d π(a) 2 2 ℓ log |A| This instance will run P a A T(a) times 4: Select each item a A exactly T(a) times at position k and put available items in {a1, . . . , am} sequentially in positions {k+1, . . . , k+m 1} and receive feedbacks (synchronized by a global clock). 5: Let D = {(β1, ζ1), . . .} be the multiset of item/clicks from position k and compute ˆθ = V S with (7) (β,ζ) D ββ and S = X 6: Let a(1), a(2), . . . , a(|A|) be an ordering A such that εi = ˆθ, a(i) a(i+1) 0 for all 1 i < |A| and set ε|A| = 2 ℓ 7: Let (u1, . . . , up) = (i [|A|] : εi 2 ℓ) and u0 = 0 Ai = (a(ui 1+1), . . . , a(ui)) Ki = (k + ui 1, . . . , k + min(m, ui) 1) 8: For each i [p] such that k + ui 1 k + m 1 call Recur Rank (ℓ+ 1, Ai, Ki) on separate threads |
| Open Source Code | Yes | The code is provided in the supplementary material. |
| Open Datasets | Yes | Movie Lens dataset We use the 20m Movie Lens dataset (Harper & Konstan, 2016) which contains 20 million ratings for 2.7 104 movies by 1.38 105 users. |
| Dataset Splits | No | The paper mentions splitting the user set for feature derivation and true weight vector computation but does not explicitly state train/validation/test splits for model training or evaluation. Phrases like 'validation set' or specific percentages for dataset partitioning are not present. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | Synthetic experiments We construct environments using the cascade click model (CM) and the position-based click model (PBM) with L = 104 items in d = 5 dimension to be displayed in K = 10 positions. We first randomly draw item vectors L and weight vector θ in d 1 dimension with each entry a standard Gaussian variable, then normalise, add one more dimension with constant 1, and divide by 2. The position bias for PBM is set as 1, 1/3, . . . , 1/K which is often adopted in applications (Wang et al., 2018). Movie Lens dataset We extract L = 103 movies with most ratings and 1.1 103 users who rate most and randomly split the user set to two parts, U1 and U2 with |U1| = 100 and |U2| = 103. We then use the rating matrix of users in U1 to derive feature vectors with d = 5 for all movies by singular-value decomposition (SVD). The resulting feature vectors L are also processed as (12). The true weight vector θ is computed by solving the linear system of L w.r.t. the rating matrix of U2. The environment is the document-based click model (DBM) with L and θ and we set K = 10. |