Sustainable Online Reinforcement Learning for Auto-bidding
Authors: Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, Bo Zheng
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive simulated and real-world experiments validate the superiority of our approach over the state-of-the-art auto-bidding algorithm. We conduct both simulated and real-world experiments to validate the effectiveness of our approach. |
| Researcher Affiliation | Collaboration | Zhiyu Mou1,2 , Yusen Huo1, Rongquan Bai1, Mingzhou Xie1, Chuan Yu1, Jian Xu1, Bo Zheng1 1 Alibaba Group, Beijing, China 2 Department of Automation, Tsinghua University, Beijing, China |
| Pseudocode | No | The paper includes diagrams (Figure 1, Figure 2) and describes algorithms in text, but it does not contain a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | The codes of simulated experiments are available at https://github.com/nobodymx/SORL-for-Auto-bidding. |
| Open Datasets | No | The paper uses proprietary data from Tao Bao/Alibaba for real-world experiments and a 'manually built' simulated RAS/VAS. While the checklist mentions data is released in supplementary material, the paper text itself does not provide a specific public link, DOI, repository, or formal citation for accessing these datasets, classifying them as not publicly available from the paper alone. |
| Dataset Splits | No | The paper describes A/B tests and training under different random seeds, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for reproduction. Evaluation is done through A/B tests in live environments rather than fixed dataset splits. |
| Hardware Specification | No | The paper states in its checklist that it includes resource information but does not specify any particular GPU models, CPU types, memory details, or cloud computing instances used for experiments within the main text or appendices provided. |
| Software Dependencies | No | The paper does not list specific version numbers for software dependencies such as Python, PyTorch, or other libraries used in its implementation. |
| Experiment Setup | Yes | The hyperparameters for all exploration policies are σ = 1, λ = 0.1. The safety threshold is set as s = 5%V (µs). We utilize 10,000 advertisers to collect data from the RAS... and compare the auto-bidding policies in 4 iterations on 1,500 advertisers using A/B tests. We leverage the V-CQL, CQL, BCQ and USCB to train auto-bidding policies under 100 different random seeds in the simulated experiment. |