Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards
Authors: Huiwen Jia, Cong Shi, Siqian Shen
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement four pricing algorithms: BLin UCB and three benchmark policies with epsilon = 0.3, 0.2, and 0.1, i.e., the probability for conducting exploration. We compare the results of the above four pricing policies with state-independent optimal price (OPT). We present two figures for the results of each instance (see Figure 2): the first row shows the offered price over periods of each algorithm and the second row depicts the cumulative time-average relaxed regret, i.e., (Pt t =1 JLP t Pt t =1 Jpi t )/t. |
| Researcher Affiliation | Academia | 1Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109. |
| Pseudocode | Yes | Algorithm 1 Online Batch Lin UCB Algorithm (BLin UCB). Algorithm 2 epsilon-greedy Benchmark. |
| Open Source Code | No | The paper does not provide any explicit statements about making the source code available or links to a code repository. |
| Open Datasets | No | The paper describes generating data for its experiments ('The total operation time horizon is 8000 periods and the capacity of the reusable resource is c = 100. We choose the price from a fixed range of [10, 18]... We consider three scenarios...'), but it does not specify the use of any publicly available datasets nor does it provide access information (e.g., links, DOIs, citations) for the generated data. |
| Dataset Splits | No | The paper describes its numerical experiments over a 'total operation time horizon' but does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper describes the parameters of its numerical experiments (e.g., 'total operation time horizon is 8000 periods', 'capacity of the reusable resource is c = 100'), but it does not provide any specific details about the hardware (e.g., CPU, GPU, memory, cloud instances) used to run these simulations. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific programming languages, libraries, or solvers and their versions) used for implementation or experimentation. |
| Experiment Setup | Yes | The total operation time horizon is 8000 periods and the capacity of the reusable resource is c = 100. We choose the price from a fixed range of [10, 18]... We consider a three-dimensional feature vector (p, φ(p), 1)... We consider three scenarios of the arrival rates associated with candidate prices and thus the corresponding system dynamics (three instances correspondingly)... For each instance, we implement four pricing algorithms: BLin UCB and three benchmark policies with epsilon = 0.3, 0.2, and 0.1... |