Towards Off-Policy Learning for Ranking Policies with Logged Feedback

Authors: Teng Xiao, Suhang Wang8700-8707

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive offline and online experiments demonstrate the effectiveness of our methods. (Abstract) Experiments on two datasets with three state-of-the-art backbones and the simulated online experiments on the Rec Sim environment show the effectiveness of our framework. (Section 1, Contributions)
Researcher Affiliation Academia Teng Xiao, Suhang Wang The Pennsylvania State University {tengxiao, szw494}@psu.edu
Pseudocode No The paper describes the proposed algorithms and frameworks using text and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing its source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes Datasets. We use two large-scale datasets. (1) Yoo Choose1: It contains sequences of user purchases and clicks. We remove the sessions whose length is smaller than 3. (2) Retail Rocket2: This dataset also contains sequences of user purchases and clicks. We remove items which are interacted less than 3 times. [1] https://recsys.acm.org/recsys15/challenge/. [2] https://www.kaggle.com/retailrocket/ecommerce-dataset.
Dataset Splits Yes We randomly sample 80% sequences as the training set, 10% as validation, and 10% as the test set.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup No The paper does not provide concrete details about the experimental setup, such as specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings, for its main experiments. While it discusses parameters like alpha and beta, their specific values for the presented results are not explicitly listed as part of a general setup.