Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
PALBERT: Teaching ALBERT to Ponder
Authors: Nikita Balagansky, Daniil Gavrilov
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimented with PALBERT and PRo BERTa on the GLUE Benchmark datasets (Wang et al., 2018). The ablation study showed that PALBERT produced significantly better results than the original Ponder Net architecture adapted for ALBERT fine-tuning. |
| Researcher Affiliation | Industry | Nikita Balagansky, Daniil Gavrilov Tinkoff EMAIL, EMAIL |
| Pseudocode | No | The paper describes the proposed methods in text but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code: https://github.com/tinkoff-ai/palbert |
| Open Datasets | Yes | We experimented with PALBERT and PRo BERTa on the GLUE Benchmark datasets (Wang et al., 2018). |
| Dataset Splits | Yes | For evaluation, we performed a grid hyperparameter search on an appropriate metric score on the dev split for each dataset. We trained each model 5 times with the best hyperparameters and reported the mean and std values. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'BERT/ALBERT/RoBERTa' but does not specify version numbers for any software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or Python. |
| Experiment Setup | Yes | For evaluation, we performed a grid hyperparameter search on an appropriate metric score on the dev split for each dataset. ... We used Adam optimizer (Kingma and Ba, 2015) for all experiments, a fixed q = 0.5 on models with the Q-exit criterion, as well as a fixed classifier dropout value equal to 0.1 (Srivastava et al., 2014), and λ = 0.1. ... Table 4: Hyperparameter search ranges used in all of our experiments. Learning rate [1e-5, 2e-5, 3e-5, 5e-5] Batch size [16, 32, 128] Lambda learning rate [1e-5, 2e-5, 3e-5] β [0.5] |