Online Learning to Rank with Features

Authors: Shuai Li, Tor Lattimore, Csaba Szepesvari

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments to compare Recur Rank with Cascade Lin UCB (Li et al., 2016; Zong et al., 2016) and Top Rank (Lattimore et al., 2018). Synthetic experiments We construct environments using the cascade click model (CM) and the position-based click model (PBM) with L = 104 items in d = 5 dimension to be displayed in K = 10 positions. We first randomly draw item vectors L and weight vector θ in d 1 dimension with each entry a standard Gaussian variable, then normalise, add one more dimension with constant 1, and divide by 2. The evolution of the regret as a function of time is shown in Fig. 2(a)(b).
Researcher Affiliation Collaboration 1The Chinese University of Hong Kong 2Deep Mind. Correspondence to: Shuai Li <shuaili@cse.cuhk.edu.hk>, Tor Lattimore <lattimore@google.com>, Csaba Szepesvári <szepi@google.com>.
Pseudocode Yes Algorithm 1 Recur Rank 1: Input: Phase number ℓand A = (a1, a2, . . .) and K = (k, k + 1, . . . , k + m 1) 2: Find a G-optimal design π = GOPT(A) 3: Let ℓ= 2 ℓand T(a) = d π(a) 2 2 ℓ log |A| This instance will run P a A T(a) times 4: Select each item a A exactly T(a) times at position k and put available items in {a1, . . . , am} sequentially in positions {k+1, . . . , k+m 1} and receive feedbacks (synchronized by a global clock). 5: Let D = {(β1, ζ1), . . .} be the multiset of item/clicks from position k and compute ˆθ = V S with (7) (β,ζ) D ββ and S = X 6: Let a(1), a(2), . . . , a(|A|) be an ordering A such that εi = ˆθ, a(i) a(i+1) 0 for all 1 i < |A| and set ε|A| = 2 ℓ 7: Let (u1, . . . , up) = (i [|A|] : εi 2 ℓ) and u0 = 0 Ai = (a(ui 1+1), . . . , a(ui)) Ki = (k + ui 1, . . . , k + min(m, ui) 1) 8: For each i [p] such that k + ui 1 k + m 1 call Recur Rank (ℓ+ 1, Ai, Ki) on separate threads
Open Source Code Yes The code is provided in the supplementary material.
Open Datasets Yes Movie Lens dataset We use the 20m Movie Lens dataset (Harper & Konstan, 2016) which contains 20 million ratings for 2.7 104 movies by 1.38 105 users.
Dataset Splits No The paper mentions splitting the user set for feature derivation and true weight vector computation but does not explicitly state train/validation/test splits for model training or evaluation. Phrases like 'validation set' or specific percentages for dataset partitioning are not present.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running experiments are provided in the paper.
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup Yes Synthetic experiments We construct environments using the cascade click model (CM) and the position-based click model (PBM) with L = 104 items in d = 5 dimension to be displayed in K = 10 positions. We first randomly draw item vectors L and weight vector θ in d 1 dimension with each entry a standard Gaussian variable, then normalise, add one more dimension with constant 1, and divide by 2. The position bias for PBM is set as 1, 1/3, . . . , 1/K which is often adopted in applications (Wang et al., 2018). Movie Lens dataset We extract L = 103 movies with most ratings and 1.1 103 users who rate most and randomly split the user set to two parts, U1 and U2 with |U1| = 100 and |U2| = 103. We then use the rating matrix of users in U1 to derive feature vectors with d = 5 for all movies by singular-value decomposition (SVD). The resulting feature vectors L are also processed as (12). The true weight vector θ is computed by solving the linear system of L w.r.t. the rating matrix of U2. The environment is the document-based click model (DBM) with L and θ and we set K = 10.