Model Selection for Production System via Automated Online Experiments
Authors: Zhenwen Dai, Praveen Chandar, Ghazal Fazelnia, Benjamin Carterette, Mounia Lalmas
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks. and We demonstrate the performance of AOE on automating online experiments for model selection. We construct two simulators based on real data to perform the evaluation since evaluation on a production system is not reproducible. We compare AOE with five baseline methods... |
| Researcher Affiliation | Industry | Zhenwen Dai Spotify zhenwend@spotify.com Praveen Ravichandran Spotify praveenr@spotify.com Ghazal Fazelnia Spotify ghazalf@spotify.com Ben Carterette Spotify benjaminc@spotify.com Mounia Lalmas-Roelleke Spotify mounial@spotify.com |
| Pseudocode | Yes | Algorithm 1: model selection with automated online experiments (AOE) |
| Open Source Code | No | The paper uses and cites 'GPy Opt: A bayesian optimization framework in python. http://github. com/Sheffield ML/GPy Opt', which is a third-party tool, but does not provide specific access to their own source code for the AOE methodology. |
| Open Datasets | Yes | We use the letter" dataset from UCI repository [41] and We use the Movie Lens 100k data [43] to construct the simulator for online experiments. |
| Dataset Splits | No | The paper mentions training data for its experiments ('randomly take 200 data points for training' and 'randomly take 20% data for training') but does not specify distinct training, validation, and test dataset splits needed for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions 'GPy Opt' and the 'Surprise package [44]' but does not specify their version numbers, which are required for reproducible software dependency information. |
| Experiment Setup | Yes | The candidate model set is generated on a 100x100 grid in the space of the two parameters in log scale. In order to compare with the OPE-based baselines, all the decision a is augmented with a ϵ-greedy step with ϵ = 0.05. We use a GP binary classifier with Matérn 3/2 kernel as the surrogate model, using 2000 inducing points. We use EI as the acquisition function implemented in GPy Opt [42]. Each run consists of 20 sequential online experiments with the first deployed model randomly picked. |