Rethinking Fano’s Inequality in Ensemble Learning
Authors: Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Nobuo Nukaga
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Further, we empirically validate and demonstrate the proposed theory through extensive experiments on actual systems. ... We validate the framework through extensive experiments on DNN ensemble systems. |
| Researcher Affiliation | Industry | Terufumi Morishita 1 Gaku Morio * 1 Shota Horiguchi * 1 Hiroaki Ozaki 1 Nobuo Nukaga 1 ... 1Hitachi, Ltd. Research and Development Group, Kokubunji, Tokyo, Japan. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We release our code as open source.1 1Available at: https://github.com/hitachi-nlp/ ensemble-metrics |
| Open Datasets | Yes | We used eight classification tasks with moderately-sized datasets for computational reasons: Boolq (Clark et al., 2019), Co LA (Dolan & Brockett, 2005), Cosmos QA (Khot et al., 2018), MNLI (Williams et al., 2018), MRPC (Dolan & Brockett, 2005), Sci Tail (Khot et al., 2018), SST (Socher et al., 2013), and QQP. ... Table E.10: Tasks used in this study.Mmajority of tasks are from GLUE benchmark (Wang et al., 2018) (shown as ) and Super GLUE benchmark (Wang et al., 2019) (shown as ). All datasets are publicly available. |
| Dataset Splits | Yes | In order to train meta-estimator of Stacking, we must take cross-validation based dataset splitting strategy (Wolpert, 1992)... In this study, we used n = 5... In this study, we used l = 4. ... TValidation sets were used only during the preliminary experiments to adjust some hyperparameters (shown below). |
| Hardware Specification | Yes | A single run of experiments required about 200 GPUs (V100) 1 day. ... Computational resources of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) were used. |
| Software Dependencies | Yes | We implemented the fine-tuning of DNNs described here using the jiant library (Phang et al., 2020) (v2.2.06), which in turn utilizes Hugging Face s Transformers library (Wolf et al., 2020). ... We implemented the model combination methods in Table 2 using scikit-learn 7. For the training of Stacking metaestimators, we used the hyperparameters shown in Table E.9. ... Most of the hyperparameters are set as default values of scikit-learn (version 0.22.2). |
| Experiment Setup | Yes | We used the hyperparameters shown in Table E.8 to finetune all of the DNN types. ... Table E.8: Hyperparameters used for fine-tuning of DNNs. hyperparameter value learning rate 3e-5 ([1e-5, 1e-4] for the random sampling of Random-Hy P) optimizer Adam (Kingma & Ba, 2015) (ϵ = 1e 8) with linear warmup (data size proportion=0.1), described in (Devlin et al., 2019). gradient clipping 1.0 gradient accumulation steps 1 epochs 5 dropout DNN specific values (follows jiant (Phang et al., 2020)) training batch size 16 inference batch size 32 number of softmax layer 1. ... Table E.9: Meta-estimator hyperparameters. |