Sustainable Online Reinforcement Learning for Auto-bidding

Authors: Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, Bo Zheng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive simulated and real-world experiments validate the superiority of our approach over the state-of-the-art auto-bidding algorithm. We conduct both simulated and real-world experiments to validate the effectiveness of our approach.
Researcher Affiliation Collaboration Zhiyu Mou1,2 , Yusen Huo1, Rongquan Bai1, Mingzhou Xie1, Chuan Yu1, Jian Xu1, Bo Zheng1 1 Alibaba Group, Beijing, China 2 Department of Automation, Tsinghua University, Beijing, China
Pseudocode No The paper includes diagrams (Figure 1, Figure 2) and describes algorithms in text, but it does not contain a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The codes of simulated experiments are available at https://github.com/nobodymx/SORL-for-Auto-bidding.
Open Datasets No The paper uses proprietary data from Tao Bao/Alibaba for real-world experiments and a 'manually built' simulated RAS/VAS. While the checklist mentions data is released in supplementary material, the paper text itself does not provide a specific public link, DOI, repository, or formal citation for accessing these datasets, classifying them as not publicly available from the paper alone.
Dataset Splits No The paper describes A/B tests and training under different random seeds, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for reproduction. Evaluation is done through A/B tests in live environments rather than fixed dataset splits.
Hardware Specification No The paper states in its checklist that it includes resource information but does not specify any particular GPU models, CPU types, memory details, or cloud computing instances used for experiments within the main text or appendices provided.
Software Dependencies No The paper does not list specific version numbers for software dependencies such as Python, PyTorch, or other libraries used in its implementation.
Experiment Setup Yes The hyperparameters for all exploration policies are σ = 1, λ = 0.1. The safety threshold is set as s = 5%V (µs). We utilize 10,000 advertisers to collect data from the RAS... and compare the auto-bidding policies in 4 iterations on 1,500 advertisers using A/B tests. We leverage the V-CQL, CQL, BCQ and USCB to train auto-bidding policies under 100 different random seeds in the simulated experiment.