Learning from a Learning User for Optimal Recommendations
Authors: Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. In this section, we study the empirical performance of RAES to validate our theoretical analysis by running simulations on synthetic datasets in comparison with several baselines. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Virginia, USA 2Department of Economics, University of Virginia, USA. Correspondence to: Fan Yao <fy4bc@virginia.edu>, Hongning Wang <hw5x@virginia.edu>, Haifeng Xu <hx4ad@virginia.edu>. |
| Pseudocode | Yes | Algorithm 1 Active Ellipsoid Search on Unit Sphere and Algorithm 2 Noise-robust Active Ellipsoid Search (RAES) |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the described methodology. |
| Open Datasets | No | The paper mentions using 'synthetic datasets' for experiments but does not provide specific access information (link, DOI, repository, or formal citation) for these datasets. |
| Dataset Splits | No | The paper refers to synthetic datasets and uses a time horizon 'T' for experiments, but it does not provide specific details on training, validation, or test set splits, nor does it specify how the data was partitioned for evaluation. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used to run its experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | In all experiments, we fix the action set A = Bd 2(0, 1), i.e., D0 = D1 = 1, and δ = 0.1, k = 1.05. We consider a (1, γ)-rational user with γ {0, 0.2} and prior knowledge matrix V0. The user s decision sequence {β(0) t } and {β(1) t } are independently drawn from [ tγ, tγ]. The ground-truth parameter θ is sampled from Bd 2(0, 1) and the reported results are collected from the same problem instance and averaged over 10 independent runs. DBGD s hyper-parameters include the starting point w0, and two learning rates δ, γ that control the step-lengths for proposing new points and update the current points, respectively. In the experiment, these hyper-parameters are set to (w0, δ, γ) = (0, d 1 2 ), as recommended in (Yue & Joachims, 2009). |