reproducibilityindex.ai

Efficient Online ML API Selection for Multi-Label Classification Tasks

Authors: Lingjiao Chen, Matei Zaharia, James Zou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Tencent, and other providers for tasks including multilabel image classiﬁcation, scene text recognition and named entity recognition. Across these tasks, Frugal MCT can achieve over 90% cost reduction while matching the accuracy of the best single API, or up to 8% better accuracy while matching the best API s cost.
Researcher Affiliation	Academia	1Department of Computer Sciences, Stanford University, Stanford, USA 2Department of Biomedical Data Science, Stanford University, Stanford, USA. Correspondence to: Lingjiao Chen <lingjiao@stanford.edu>.
Pseudocode	Yes	Algorithm 1 Frugal MCT Online API Selection Algorithm. Input :c, b, {x T r 1 , x T r 2 , , x T r N T r}, {x1, x2, , x N} Output :Frugal MCT online API selector so( )
Open Source Code	Yes	As a dataset contribution, we have also released 1 our dataset of 295,212 samples annotated by commercial multi-label APIs as the largest dataset and resource for studying multi-label ML prediction APIs. The dataset and our code can be accessed from https://github.com/lchen001/Frugal MCT.
Open Datasets	Yes	For MIC, we use three popular datasets including PASCAL (Everingham et al., 2015), MIR (Huiskes & Lew, 2008) and COCO (Lin et al., 2014). For STR, we use three large scale Chinese text recognition datasets, MTWI (He et al., 2018), Re CTS (Zhang et al., 2019) and LSVT (Sun et al., 2019). The other datasets, CONLL (Sang & Meulder, 2003), ZHNER (ZHN) and GMB (Bos, 2013), are used for NER task.
Dataset Splits	Yes	More precisely, for each possible base service, we train a Frugal MCT strategy and evaluate its performance on a validation dataset, and pick the base service corresponding to the highest performance. To obtain a more accurate strategy, we can adopt a search algorithm to select the best δ value based on the evaluation the performance on a validation dataset.
Hardware Specification	Yes	All experiments were run on a machine with 8 Intel Xeon Platinum 2.5 GHz cores, 32 GB RAM, and 500GB disk with Ubuntu 16.04 LTS as the OS. For multi-label image classiﬁcation, the Git Hub model (SSD) takes 6s to classify each image, resulting in an equivalent cost of $0.0015 per image... We evaluate the inference time of all Git Hub models on an Amazon EC2 p2.x instance, which is $0.90 per hour.
Software Dependencies	No	The paper states 'Our code is implemented in Python 3.7.' and mentions various ILP solvers (CBC, MOSEK, GUROBI) but only provides a version number for Python and no specific versions for any libraries, frameworks, or the named solvers.
Experiment Setup	Yes	We set budget b = 6, the price of Everypixel, the cheapest commercial API (except the open source model from Git Hub)... The label combiner requires two parameters: the combining weight w [0, 1] and the quality score threshold θ [0, 1]... M = 10 is sufﬁcient to obtain a good combiner... A naive approach is to set a small constant value, say, δ = 0.01... CS = { 10, 9, 8, , 0, 1, 2, , 10} is sufﬁcient to obtain a highly accurate solution.