Model Selection for Production System via Automated Online Experiments

Authors: Zhenwen Dai, Praveen Chandar, Ghazal Fazelnia, Benjamin Carterette, Mounia Lalmas

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks. and We demonstrate the performance of AOE on automating online experiments for model selection. We construct two simulators based on real data to perform the evaluation since evaluation on a production system is not reproducible. We compare AOE with five baseline methods...
Researcher Affiliation Industry Zhenwen Dai Spotify zhenwend@spotify.com Praveen Ravichandran Spotify praveenr@spotify.com Ghazal Fazelnia Spotify ghazalf@spotify.com Ben Carterette Spotify benjaminc@spotify.com Mounia Lalmas-Roelleke Spotify mounial@spotify.com
Pseudocode Yes Algorithm 1: model selection with automated online experiments (AOE)
Open Source Code No The paper uses and cites 'GPy Opt: A bayesian optimization framework in python. http://github. com/Sheffield ML/GPy Opt', which is a third-party tool, but does not provide specific access to their own source code for the AOE methodology.
Open Datasets Yes We use the letter" dataset from UCI repository [41] and We use the Movie Lens 100k data [43] to construct the simulator for online experiments.
Dataset Splits No The paper mentions training data for its experiments ('randomly take 200 data points for training' and 'randomly take 20% data for training') but does not specify distinct training, validation, and test dataset splits needed for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper mentions 'GPy Opt' and the 'Surprise package [44]' but does not specify their version numbers, which are required for reproducible software dependency information.
Experiment Setup Yes The candidate model set is generated on a 100x100 grid in the space of the two parameters in log scale. In order to compare with the OPE-based baselines, all the decision a is augmented with a ϵ-greedy step with ϵ = 0.05. We use a GP binary classifier with Matérn 3/2 kernel as the surrogate model, using 2000 inducing points. We use EI as the acquisition function implemented in GPy Opt [42]. Each run consists of 20 sequential online experiments with the first deployed model randomly picked.