Uncertainty-Aware Instance Reweighting for Off-Policy Learning

Authors: Xiaoying Zhang, Junpu Chen, Hongning Wang, Hong Xie, Yang Liu, John C.S. Lui, Hang Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results on the synthetic and real-world recommendation datasets demonstrate that UIPS significantly improves the quality of the discovered policy, when compared against an extensive list of state-of-the-art baselines.
Researcher Affiliation Collaboration 1Byte Dance Research 2Chong Qing University 3Tsinghua University 4 Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Science 5 The Chinese University of Hong Kong
Pseudocode Yes Algorithm 1 UIPS (found in Appendix 7.1)
Open Source Code Yes All data and code can be found in https://github.com/Xiaoyinggit/UIPS.git.
Open Datasets Yes We evaluate UIPS on both synthetic data and three real-world datasets with unbiased collection... Yahoo! R31; (2) Coat2; (3) Kuai Rec [12]... The Wiki10-31K dataset contains approximately 20K samples.
Dataset Splits Yes We split the dataset into train, validation and test sets with size 11K:3K:6K. (synthetic data) ... a small part of unbiased data split for validation purpose (5% on Yahoo R3 and Coat, and 15% on Kuai Rec).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using neural networks and logistic regression but does not provide specific software dependencies or their version numbers.
Experiment Setup Yes the learning rate was searched in {1e 5, 1e 4, 1e 3, 1e 2}; λ, γ, η1 were searched in {0.5, 0.1, 1, 2,5, 10, 15, 20, 25, 30, 40, 50}. And η2 was searched in {1, 10, 100, 1000}.