Optimizing the coalition gain in Online Auctions with Greedy Structured Bandits

Authors: Dorian Baudry, Hugo Richard, Maria Cherifa, Vianney Perchet, Clément Calauzènes

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix D we present a benchmark of LG, GG, UCB, EXP3 and OSUB on synthetic data in terms of the expected regret R(T). This benchmark illustrates the strong performance of LG relative to the other approaches.
Researcher Affiliation Collaboration Dorian Baudry1,2, Hugo Richard2, Maria Cherifa2, Clément Calauzènes2 Vianney Perchet2 1 Department of Statistics, University of Oxford. 2 Inria Fairplay Joint team, CREST, ENSAE Paris, Criteo AI Lab
Pseudocode Yes Algorithm 1 Local Greedy (LG) Input: exploration parameter α, neighborhoods (V(n))n [N] (Definition 1)
Open Source Code Yes yes the paper provide open access to the code (as a zip folder in the supplementary materials) with all the instructions needed to reproduce it .
Open Datasets No The paper uses "synthetic data" generated from specified distributions (e.g., B(0.05), Beta(a, b), Trunc. Exp(0,1)) for its simulations, rather than a pre-existing publicly available dataset. Therefore, no access information for a public dataset is provided or applicable.
Dataset Splits No The paper describes generating synthetic data and running simulations to compute regret. It does not mention any explicit training, validation, or test splits for the data, which is typical for bandit algorithm evaluations where data is simulated online.
Hardware Specification Yes All the experiments were conducted on a single standard laptop, with an execution time shorter than 24 hours.
Software Dependencies No All the code for the experiments is written in Python. We use Matplotlib for plotting [18], Numpy [15] for numerical computing and Scipy [31] for scientific and technical computing. While specific libraries are mentioned and cited, their version numbers are not provided, which is necessary for reproducibility.
Experiment Setup Yes In a first simulation, we consider a coalition of size N = 100 and a competition of size p = 4. At each timestep t, the algorithm decide a number of bidders nt to send to the auction and the values of all bidders (coalition and competition) v {0, 1}nt+p are sampled according to B(0.05). With probability nt nt+p, the reward v(1) v(2) is received and v(1) is observed. The (pseudo) regret at time t is then computed as the sum of reward obtained up to time t. The simulation above is repeated 20 times with random seeds and the mean value across seeds is reported as the expected regret R(t) in Figure 3.