Balancing Risk and Reward: A Batched-Bandit Strategy for Automated Phased Release

Authors: Yufan Li, Jialiang Mao, Iavor Bojinov

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Numerical and empirical experiments
Researcher Affiliation Collaboration Yufan Li1, Jialiang Mao2, Iavor Bojinov3 1Harvard University 2Linked In Corporation 3Harvard Business School
Pseudocode Yes Algorithm 1 Output ramp size adaptively
Open Source Code No The paper does not provide a direct link to source code or an explicit statement about its release.
Open Datasets No The paper uses 'semi-real Linked In ramp schedule comparison' where data 'is simulated from (4) using stage-wise µtrue (w), σ(w)2, w = 0, 1 (both unobserved)' and states 'Due to privacy constraints, the individual-level data is not available'. No concrete access information for a publicly available dataset is provided.
Dataset Splits No The paper describes a sequential A/B testing approach and simulations, but does not provide details on specific training/validation/test dataset splits like percentages or sample counts.
Hardware Specification No The paper mentions that 'Nt are incoming population size reduced by 104 factor for tractability on a personal computer', but does not provide specific hardware details like CPU/GPU models or memory amounts used for running experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes For each scenario, we set T = 10 with Nt = 500, t and we choose non-informative prior µ0(w) = 0, σ0(w)2 = 100, w = 0, 1. We assume model variance is known; however, using (6) to estimate the variance gives similar results. and We set B = 500 to produce (h), although the model is not budget-aware. and Under the legends (B, δ) , we set bt = B, t = 1 (1 δ)1/T , t. We also use (i) ration budget to denote (B, δ) = ( 500, 0.01), bt = 400, t 5, bt = 500, t > 5 and t = 1 (1 δ)1/T , t; (ii) ration tolerance to denote (B, δ) = ( 500, 0.01), bt = 500, t and t = 0.0001, t 5, t = 0.0019, t > 5