Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Joint Online Learning and Decision-making via Dual Mirror Descent
Authors: Alfonso Lobos, Paul Grigas, Zheng Wen
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our algorithm is effective and robust as compared to a heuristic approach in a bidding and allocation problem with no generic parameter learning in online advertising. Additionally, we study the effects of different generic parameter learning strategies in a linear contextual bandits problem with bounds on the number of actions taken. This section describes the two experiments performed. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Google Deep Mind, Mountain view, California. |
| Pseudocode | Yes | Algorithm 1 Generic Online Learning and Decision-making Algorithmic Scheme, Algorithm 2 Online Learning and Decision-making via Dual Mirror Descent, and Algorithm 3 Solving ϕ( ; , ) are all presented in the paper. |
| Open Source Code | Yes | The experiments code is located at https://tinyurl.com/br3dzeak. |
| Open Datasets | Yes | This experiment uses data from the online advertising company Criteo (Diemert et al., 2017). |
| Dataset Splits | Yes | The logs from the first three weeks were used as training data, the next two days as validation, and the last week as test. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions training a neural network for conversion prediction from Pan et al. (2018) and using 'traditional subgradient descent' but does not specify version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We used 100 simulations for statistical significance and use traditional subgradient descent on Step 7. of Algorithm 2. In all simulations T = 21073 (more experimental details in the supplement). and We tried eight combinations of d n, run Algorithm 2 using T = 1000, 5000, 10000, use ρ = 4, and run 100 simulations of each experiment setting. Each simulation uses a unique seed to create θ and the mean matrix W by sampling i.i.d. Uniform( 0.5, 0.5) random variables. |