Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards

Authors: Huiwen Jia, Cong Shi, Siqian Shen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement four pricing algorithms: BLin UCB and three benchmark policies with epsilon = 0.3, 0.2, and 0.1, i.e., the probability for conducting exploration. We compare the results of the above four pricing policies with state-independent optimal price (OPT). We present two figures for the results of each instance (see Figure 2): the first row shows the offered price over periods of each algorithm and the second row depicts the cumulative time-average relaxed regret, i.e., (Pt t =1 JLP t Pt t =1 Jpi t )/t.
Researcher Affiliation Academia 1Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109.
Pseudocode Yes Algorithm 1 Online Batch Lin UCB Algorithm (BLin UCB). Algorithm 2 epsilon-greedy Benchmark.
Open Source Code No The paper does not provide any explicit statements about making the source code available or links to a code repository.
Open Datasets No The paper describes generating data for its experiments ('The total operation time horizon is 8000 periods and the capacity of the reusable resource is c = 100. We choose the price from a fixed range of [10, 18]... We consider three scenarios...'), but it does not specify the use of any publicly available datasets nor does it provide access information (e.g., links, DOIs, citations) for the generated data.
Dataset Splits No The paper describes its numerical experiments over a 'total operation time horizon' but does not specify explicit training, validation, or test dataset splits.
Hardware Specification No The paper describes the parameters of its numerical experiments (e.g., 'total operation time horizon is 8000 periods', 'capacity of the reusable resource is c = 100'), but it does not provide any specific details about the hardware (e.g., CPU, GPU, memory, cloud instances) used to run these simulations.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific programming languages, libraries, or solvers and their versions) used for implementation or experimentation.
Experiment Setup Yes The total operation time horizon is 8000 periods and the capacity of the reusable resource is c = 100. We choose the price from a fixed range of [10, 18]... We consider a three-dimensional feature vector (p, φ(p), 1)... We consider three scenarios of the arrival rates associated with candidate prices and thus the corresponding system dynamics (three instances correspondingly)... For each instance, we implement four pricing algorithms: BLin UCB and three benchmark policies with epsilon = 0.3, 0.2, and 0.1...