Joint Online Learning and Decision-making via Dual Mirror Descent
Authors: Alfonso Lobos, Paul Grigas, Zheng Wen
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our algorithm is effective and robust as compared to a heuristic approach in a bidding and allocation problem with no generic parameter learning in online advertising. Additionally, we study the effects of different generic parameter learning strategies in a linear contextual bandits problem with bounds on the number of actions taken. This section describes the two experiments performed. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Google Deep Mind, Mountain view, California. |
| Pseudocode | Yes | Algorithm 1 Generic Online Learning and Decision-making Algorithmic Scheme, Algorithm 2 Online Learning and Decision-making via Dual Mirror Descent, and Algorithm 3 Solving ϕ( ; , ) are all presented in the paper. |
| Open Source Code | Yes | The experiments code is located at https://tinyurl.com/br3dzeak. |
| Open Datasets | Yes | This experiment uses data from the online advertising company Criteo (Diemert et al., 2017). |
| Dataset Splits | Yes | The logs from the first three weeks were used as training data, the next two days as validation, and the last week as test. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions training a neural network for conversion prediction from Pan et al. (2018) and using 'traditional subgradient descent' but does not specify version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We used 100 simulations for statistical significance and use traditional subgradient descent on Step 7. of Algorithm 2. In all simulations T = 21073 (more experimental details in the supplement). and We tried eight combinations of d n, run Algorithm 2 using T = 1000, 5000, 10000, use ρ = 4, and run 100 simulations of each experiment setting. Each simulation uses a unique seed to create θ and the mean matrix W by sampling i.i.d. Uniform( 0.5, 0.5) random variables. |