Efficient Lifelong Model Evaluation in an Era of Rapid Progress
Authors: Ameya Prabhu, Vishaal Udandarao, Philip Torr, Matthias Bethge, Adel Bibi, Samuel Albanie
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluations across 31,000 models demonstrate that S&S achieves highly-efficient approximate accuracy measurement, reducing compute cost from 180 GPU days to 5 GPU hours ( 1000x reduction) on a single A100 GPU, with low approximation error and memory cost of <100MB. |
| Researcher Affiliation | Academia | 1Tübingen AI Center, University of Tübingen 2University of Cambridge 3University of Oxford |
| Pseudocode | Yes | Here, we provide pythonic-pseudo code for the constituent algorithms of Sort & Search, which we described in detail in Section 3. |
| Open Source Code | Yes | https://github.com/bethgelab/sort-and-search |
| Open Datasets | Yes | For Lifelong-CIFAR10, we use 31, 250 CIFAR-10 pre-trained models from the NATS-Bench-Topology-search space [25]. For Lifelong-Image Net, we use 167 Image Net-1K and Image Net-21K pre-trained models, sourced primarily from timm [98] and imagenet-testbed [84]. |
| Dataset Splits | No | The paper describes splits for evaluating its framework's performance (e.g., 'Sample Addition Split (1 insert D)', 'Model Evaluation Split (2 insert M)'), but it does not provide traditional training/validation/test dataset splits for model training, as it primarily evaluates pre-trained models. |
| Hardware Specification | Yes | reducing compute cost from 180 GPU days to 5 GPU hours ( 1000x reduction) on a single A100 GPU |
| Software Dependencies | No | The paper mentions software like 'timm [98]' and 'Pythonic Pseudo-code', but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We run our S&S over 13 different sampling budgets: {8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768} on both Lifelong-Image Net and Lifelong-CIFAR10. |