Distributed Fine-tuning of Language Models on Private Data

Authors: Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov, Alex Nevidomsky

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study approaches to distributed fine-tuning of a general model on user private data with the additional requirements of maintaining the quality on the general data and minimization of communication costs. We propose a novel technique that significantly improves prediction quality on users language compared to a general model and outperforms gradient compression methods in terms of communication efficiency. The proposed procedure is fast and leads to an almost 70% perplexity reduction and 8.7 percentage point improvement in keystroke saving rate on informal English texts. Finally, we propose an experimental framework for evaluating differential privacy of distributed training of language models and show that our approach has good privacy guarantees. Table 1 summarizes our experiments with on-device model update algorithms.
Researcher Affiliation Industry Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov & Alex Nevidomsky Samsung R&D Institute Russia Moscow, Russia v.popov@samsung.com,m.kudinov@samsung.com, p.irina@samsung.com,p.vytovtov@partner.samsung.com, a.nevidomsky@samsung.com
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. Figure 1 provides an "Overview of the approach" which is a diagram.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets No The paper mentions "Twitter and Wikipedia corpora for the user and standard English corpora correspondingly." and "The standard English train dataset contained approximately 30M tokens. The user train dataset contained approximately 1.7M tokens." While these are publicly known datasets, the paper does not provide specific links, DOIs, or citations to the exact versions or subsets used that would allow concrete access for reproduction.
Dataset Splits Yes The hyperparameters of the model were initially tuned on the Standard English validation set of 3.8M tokens. Updated models were tested on subsets of the Twitter and Wikipedia corpora containing 200k and 170k tokens correspondingly.
Hardware Specification Yes Each model was trained on a mobile phone with a quad-core mobile CPU with a clock frequency 2.31 GHz.
Software Dependencies No The paper mentions "LSTM architecture from Zaremba et al. (2014)" and refers to neural network components, but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes The hyperparameters of the model were initially tuned on the Standard English validation set of 3.8M tokens. For our experiments we used LSTM architecture from Zaremba et al. (2014) with 2x650 LSTM layers, a vocabulary size of 30k, dropout 0.5, minibatch size 20, BPTT steps 35. We used a minibatch size 10, number of BPTT steps 20, learning rate 0.75 and 1 epoch.