Editable Neural Networks
Authors: Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Pyrkin, Sergei Popov, Artem Babenko
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness of this method on large-scale image classification and machine translation tasks. |
| Researcher Affiliation | Collaboration | Anton Sinitsin Yandex National Research University Higher School of Economics ant.sinitsin@gmail.com Vsevolod Plokhotnyuk National Research University Higher School of Economics vsevolod-pl@yandex.ru Dmitry Pyrkin National Research University Higher School of Economics alagaster@yandex.ru Sergei Popov Yandex sapopov@yandex-team.ru Artem Babenko Yandex National Research University Higher School of Economics artem.babenko@phystech.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available online at https://github.com/xtinkt/editable |
| Open Datasets | Yes | First, we experiment on image classification with the small CIFAR-10 dataset with standard train/test splits (Krizhevsky et al.). ... Here we experiment with the ILSVRC image classification task (Deng et al. (2009)). ... We consider the IWSLT 2014 German-English translation task with the standard training/test splits (Cettolo et al. (2015)). |
| Dataset Splits | Yes | First, we experiment on image classification with the small CIFAR-10 dataset with standard train/test splits (Krizhevsky et al.). ... We measure the drawdown on the full ILSVRC validation set of 50.000 images. ... We consider the IWSLT 2014 German-English translation task with the standard training/test splits (Cettolo et al. (2015)). |
| Hardware Specification | Yes | In all cases Editable Fine-Tuning took under 48 hours on a single Ge Force 1080 Ti GPU while a single edit requires less than 150 ms. |
| Software Dependencies | Yes | We use Transformer configuration transformer iwslt de en from Fairseq v0.8.0 (Ott et al. (2019)) |
| Experiment Setup | Yes | All models trained on this dataset follow the Res Net-18 (He et al. (2015)) architecture and use the Adam optimizer (Kingma & Ba (2014)) with default hyperparameters. ... We set the learning rate to 10 5 for the pre-existing layers and 10 3 for the extra block. ... We use the SGD optimizer with momentum µ=0.9. ... We train the Transformer (Vaswani et al. (2017)) model similar to transformer-base configuration, optimized for IWSLT De-En task. |