Efficiently Controlling Multiple Risks with Pareto Testing
Authors: Bracha Laufer-Goldshtein, Adam Fisch, Regina Barzilay, Tommi S. Jaakkola
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications. ... 7 EXPERIMENTS Experimental setup. we test our method over five text classification tasks of varied difficulty levels: IMDB (Maas et al., 2011), AG News (Zhang et al., 2015), QNLI (Rajpurkar et al., 2016), QQP, MNLI (Williams et al., 2018). |
| Researcher Affiliation | Academia | Bracha Laufer-Goldshtein, Adam Fisch, Regina Barzilay & Tommi Jaakkola CSAIL, MIT, {lauferb,fisch,regina,tommi}@csail.mit.edu |
| Pseudocode | Yes | Algorithm 1 Pareto Testing Definitions: f is a configurable model with n thresholds λ = (λ1, . . . , λn). ... Algorithm F.1 Recover Pareto Optimal Set Definitions: ... Algorithm F.2 Learn then Test (Single Objective) Definitions: ... Algorithm F.3 3D Graph Testing Definitions: ... Algorithm F.4 Shortest-Path Testing Definitions: ... |
| Open Source Code | Yes | Code. Our code will be made available at https://github.com/bracha-laufer/ pareto-testing. |
| Open Datasets | Yes | We test our method over five text classification tasks of varied difficulty levels: IMDB (Maas et al., 2011), AG News (Zhang et al., 2015), QNLI (Rajpurkar et al., 2016), QQP, MNLI (Williams et al., 2018). |
| Dataset Splits | Yes | Table C.1: Datasets Details... IMDB |Y| Task Train Val. Test Cal. (out of Test) Full model Acc. [%]... IMDB 2 Sentiment analysis on movie reviews 20K 5K 10K 5K 94... Algorithm 1 Pareto Testing Definitions: ... Dcal = Dopt Dtesting is a calibration set of size m, split into optimization and (statistical) testing sets of size m1 and m2, respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as GPU models (e.g., NVIDIA A100, Tesla V100) or CPU specifications (e.g., Intel Core i7). |
| Software Dependencies | No | The paper mentions using a 'BERT-base model' and discusses deep learning concepts, but it does not specify software dependencies with version numbers (e.g., 'PyTorch 1.x', 'Python 3.x'). |
| Experiment Setup | Yes | Experimental setup. we test our method over five text classification tasks... We use a BERT-base model (Devlin et al., 2018) with K = 12 layers and W = 12 heads per layer... Prediction Heads. Each prediction head is a 2-layer feed-forward neural network with 32 dimensional hidden states, and ReLU activation... Token importance predictors. Each token importance predictor is a 2-layer feed-forward neural network with 32 dimensional hidden states, and ReLU activation. ... Training. The core model is first finetuned on each task. |