Editable Neural Networks

Authors: Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Pyrkin, Sergei Popov, Artem Babenko

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the effectiveness of this method on large-scale image classification and machine translation tasks.
Researcher Affiliation Collaboration Anton Sinitsin Yandex National Research University Higher School of Economics ant.sinitsin@gmail.com Vsevolod Plokhotnyuk National Research University Higher School of Economics vsevolod-pl@yandex.ru Dmitry Pyrkin National Research University Higher School of Economics alagaster@yandex.ru Sergei Popov Yandex sapopov@yandex-team.ru Artem Babenko Yandex National Research University Higher School of Economics artem.babenko@phystech.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is available online at https://github.com/xtinkt/editable
Open Datasets Yes First, we experiment on image classification with the small CIFAR-10 dataset with standard train/test splits (Krizhevsky et al.). ... Here we experiment with the ILSVRC image classification task (Deng et al. (2009)). ... We consider the IWSLT 2014 German-English translation task with the standard training/test splits (Cettolo et al. (2015)).
Dataset Splits Yes First, we experiment on image classification with the small CIFAR-10 dataset with standard train/test splits (Krizhevsky et al.). ... We measure the drawdown on the full ILSVRC validation set of 50.000 images. ... We consider the IWSLT 2014 German-English translation task with the standard training/test splits (Cettolo et al. (2015)).
Hardware Specification Yes In all cases Editable Fine-Tuning took under 48 hours on a single Ge Force 1080 Ti GPU while a single edit requires less than 150 ms.
Software Dependencies Yes We use Transformer configuration transformer iwslt de en from Fairseq v0.8.0 (Ott et al. (2019))
Experiment Setup Yes All models trained on this dataset follow the Res Net-18 (He et al. (2015)) architecture and use the Adam optimizer (Kingma & Ba (2014)) with default hyperparameters. ... We set the learning rate to 10 5 for the pre-existing layers and 10 3 for the extra block. ... We use the SGD optimizer with momentum µ=0.9. ... We train the Transformer (Vaswani et al. (2017)) model similar to transformer-base configuration, optimized for IWSLT De-En task.