Balancing Risk and Reward: A Batched-Bandit Strategy for Automated Phased Release
Authors: Yufan Li, Jialiang Mao, Iavor Bojinov
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Numerical and empirical experiments |
| Researcher Affiliation | Collaboration | Yufan Li1, Jialiang Mao2, Iavor Bojinov3 1Harvard University 2Linked In Corporation 3Harvard Business School |
| Pseudocode | Yes | Algorithm 1 Output ramp size adaptively |
| Open Source Code | No | The paper does not provide a direct link to source code or an explicit statement about its release. |
| Open Datasets | No | The paper uses 'semi-real Linked In ramp schedule comparison' where data 'is simulated from (4) using stage-wise µtrue (w), σ(w)2, w = 0, 1 (both unobserved)' and states 'Due to privacy constraints, the individual-level data is not available'. No concrete access information for a publicly available dataset is provided. |
| Dataset Splits | No | The paper describes a sequential A/B testing approach and simulations, but does not provide details on specific training/validation/test dataset splits like percentages or sample counts. |
| Hardware Specification | No | The paper mentions that 'Nt are incoming population size reduced by 104 factor for tractability on a personal computer', but does not provide specific hardware details like CPU/GPU models or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | For each scenario, we set T = 10 with Nt = 500, t and we choose non-informative prior µ0(w) = 0, σ0(w)2 = 100, w = 0, 1. We assume model variance is known; however, using (6) to estimate the variance gives similar results. and We set B = 500 to produce (h), although the model is not budget-aware. and Under the legends (B, δ) , we set bt = B, t = 1 (1 δ)1/T , t. We also use (i) ration budget to denote (B, δ) = ( 500, 0.01), bt = 400, t 5, bt = 500, t > 5 and t = 1 (1 δ)1/T , t; (ii) ration tolerance to denote (B, δ) = ( 500, 0.01), bt = 500, t and t = 0.0001, t 5, t = 0.0019, t > 5 |