Early Time Classification with Accumulated Accuracy Gap Control
Authors: Liran Ringel, Regev Cohen, Daniel Freedman, Michael Elad, Yaniv Romano
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments demonstrate the effectiveness, applicability, and usefulness of our method. We show that our proposed early stopping mechanism reduces up to 94% of timesteps used for classification while achieving rigorous accuracy gap control. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Technion Israel Institute of Technology, Haifa, Israel 2Verily AI, Israel 3Department of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Liran Ringel <liranringel@cs.technion.ac.il>. |
| Pseudocode | Yes | Algorithm 1 Candidate Screening (Stage 1), Algorithm 2 Testing (Stage 2), Algorithm A.3 Fixed sequence testing for marginal risk control |
| Open Source Code | Yes | A software package implementing the proposed methods is publicly available at Git Hub.1 (...) 1https://github.com/liranringel/etc |
| Open Datasets | Yes | We test the applicability of our methods on five datasets: Tiselac (Ienco, 2017), Electric Devices (Chen et al., 2015), Pen Digits (Alpaydin & Alimoglu, 1998), Crop (Tan et al., 2017), and Walking Sitting Standing (Reyes-Ortiz et al., 2012). These datasets are publicly available via the aeon toolkit. (...) The Qu ALITY dataset (Pang et al., 2022) (...) The Qu AIL dataset (Rogers et al., 2020). |
| Dataset Splits | Yes | For the calibration of the early stopping rule, we employ 3073 labeled samples to form Dcal while reserving the remaining 1536 samples for testing. (...) To implement and evaluate our methods, we partition each dataset into four distinct sets: 80% of the samples are allocated for model fitting, while the remaining samples are equally divided to form Dcal-1, Dcal-2, and Dtest. (...) We allocate 1/8 of the training samples to a validation set and optimize the model on the remaining 7/8 of the samples. Training continues until there is no improvement in the loss on the validation set for 30 epochs. The model with the best validation set loss is then saved. |
| Hardware Specification | No | The paper mentions using an 'LSTM model' and notes that models like 'Vicuna-13B' and 'Llama 2 70B' were used, accessible via Hugging Face. However, it does not specify the underlying hardware (e.g., GPU models, CPU types, memory) used to run or train these models for their experiments. |
| Software Dependencies | No | The paper mentions using the 'vLLM framework' and 'Adam' optimizer, along with 'LSTM' models. However, it does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch, TensorFlow, specific libraries) required for reproducibility. |
| Experiment Setup | Yes | In all experiments, we set the target accuracy gap level to α = 10%, with δ = 1% and = 0.01. (...) We used a standard LSTM for feature extraction with one recurrent layer with a hidden size of 32, except for Walking Sitting Standing where we used 2 recurrent layers, each with a hidden size of 256. (...) We set the hyperparameter γ to 0.2 in all experiments. (...) The optimizer used to minimize the objective function is Adam (Kingma & Ba, 2014), with a learning rate of 0.001, and a batch size of 64. (...) Training continues until there is no improvement in the loss on the validation set for 30 epochs. |