reproducibilityindex.ai

Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Authors: Thomas Moreau, Mathurin Massias, Alexandre Gramfort, Pierre Ablin, Pierre-Antoine Bannier, Benjamin Charlier, Mathieu Dagréou, Tom Dupre la Tour, Ghislain DURIF, Cassio F. Dantas, Quentin Klopfenstein, Johan Larsson, En Lai, Tanguy Lefort, Benoît Malézieux, Badr MOUFAD, Binh T. Nguyen, Alain Rakotomamonjy, Zaccharie Ramzi, Joseph Salmon, Samuel Vaiter

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: ℓ2-regularized logistic regression, Lasso, and Res Net18 training for image classification.
Researcher Affiliation	Collaboration	1 Université Paris-Saclay, Inria, CEA, 91120 Palaiseau, France 2 Univ Lyon, Inria, CNRS, ENS de Lyon, UCB Lyon 1, LIP UMR 5668, F-69342, Lyon, France ... 11 Criteo AI Lab, Paris, France
Pseudocode	No	The paper describes the conceptual structure of Benchopt components (Objective, Datasets, Solvers) but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	By the open source and collaborative design of Benchopt (BSD 3-clause license), we aim to open the way towards community-endorsed and peer-reviewed benchmarks that will improve the tracking of progress in optimization for ML. The code for the benchmark is available at https://github.com/benchopt/benchmark_logreg_ l2/.
Open Datasets	Yes	To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: ℓ2-regularized logistic regression, Lasso, and Res Net18 training for image classification. We provide a cross-dataset SVHN, Netzer et al. (2011); MNIST, Le Cun et al. (2010) and CIFAR-10, Krizhevsky (2009)
Dataset Splits	Yes	A recurrent criticism of such benchmarks is that only the best test error is reported. In Figure 6, we measure the effect of using a train-validation-test split, by keeping a fraction of the training set as a validation set. The splits we use are detailed in Table F.1.
Hardware Specification	Yes	All presented benchmarks are run on 10 cores of an Intel Xeon Gold 6248 CPUs @ 2.50GHz and NVIDIA V100 GPUs (16GB).
Software Dependencies	Yes	Benchopt is written in Python, but Solvers run with implementations in different languages (e.g., R and Julia, as in Section 4) and frameworks (e.g., Py Torch and Tensor Flow, as in Section 5). Py Torch Lightning/pytorchlightning: 0.7.6 release. Version 0.7.6.
Experiment Setup	Yes	We also consider different scenarios for the objective function: (i) scaling (or not) the features... (ii) fitting (or not) an unregularized intercept term... (iii) working (or not) with sparse features, which prevent explicit centering during preprocessing to keep memory usage limited. We detail the remaining hyperparameters in Table F.2, and discuss their selection as well as their sensitivity in Appendix F.