reproducibilityindex.ai

Model Selection for Production System via Automated Online Experiments

Authors: Zhenwen Dai, Praveen Chandar, Ghazal Fazelnia, Benjamin Carterette, Mounia Lalmas

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks. and We demonstrate the performance of AOE on automating online experiments for model selection. We construct two simulators based on real data to perform the evaluation since evaluation on a production system is not reproducible. We compare AOE with ﬁve baseline methods...
Researcher Affiliation	Industry	Zhenwen Dai Spotify zhenwend@spotify.com Praveen Ravichandran Spotify praveenr@spotify.com Ghazal Fazelnia Spotify ghazalf@spotify.com Ben Carterette Spotify benjaminc@spotify.com Mounia Lalmas-Roelleke Spotify mounial@spotify.com
Pseudocode	Yes	Algorithm 1: model selection with automated online experiments (AOE)
Open Source Code	No	The paper uses and cites 'GPy Opt: A bayesian optimization framework in python. http://github. com/Sheffield ML/GPy Opt', which is a third-party tool, but does not provide specific access to their own source code for the AOE methodology.
Open Datasets	Yes	We use the letter" dataset from UCI repository [41] and We use the Movie Lens 100k data [43] to construct the simulator for online experiments.
Dataset Splits	No	The paper mentions training data for its experiments ('randomly take 200 data points for training' and 'randomly take 20% data for training') but does not specify distinct training, validation, and test dataset splits needed for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper mentions 'GPy Opt' and the 'Surprise package [44]' but does not specify their version numbers, which are required for reproducible software dependency information.
Experiment Setup	Yes	The candidate model set is generated on a 100x100 grid in the space of the two parameters in log scale. In order to compare with the OPE-based baselines, all the decision a is augmented with a ϵ-greedy step with ϵ = 0.05. We use a GP binary classiﬁer with Matérn 3/2 kernel as the surrogate model, using 2000 inducing points. We use EI as the acquisition function implemented in GPy Opt [42]. Each run consists of 20 sequential online experiments with the ﬁrst deployed model randomly picked.