Distributed Fine-tuning of Language Models on Private Data
Authors: Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov, Alex Nevidomsky
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study approaches to distributed fine-tuning of a general model on user private data with the additional requirements of maintaining the quality on the general data and minimization of communication costs. We propose a novel technique that significantly improves prediction quality on users language compared to a general model and outperforms gradient compression methods in terms of communication efficiency. The proposed procedure is fast and leads to an almost 70% perplexity reduction and 8.7 percentage point improvement in keystroke saving rate on informal English texts. Finally, we propose an experimental framework for evaluating differential privacy of distributed training of language models and show that our approach has good privacy guarantees. Table 1 summarizes our experiments with on-device model update algorithms. |
| Researcher Affiliation | Industry | Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov & Alex Nevidomsky Samsung R&D Institute Russia Moscow, Russia v.popov@samsung.com,m.kudinov@samsung.com, p.irina@samsung.com,p.vytovtov@partner.samsung.com, a.nevidomsky@samsung.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. Figure 1 provides an "Overview of the approach" which is a diagram. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper mentions "Twitter and Wikipedia corpora for the user and standard English corpora correspondingly." and "The standard English train dataset contained approximately 30M tokens. The user train dataset contained approximately 1.7M tokens." While these are publicly known datasets, the paper does not provide specific links, DOIs, or citations to the exact versions or subsets used that would allow concrete access for reproduction. |
| Dataset Splits | Yes | The hyperparameters of the model were initially tuned on the Standard English validation set of 3.8M tokens. Updated models were tested on subsets of the Twitter and Wikipedia corpora containing 200k and 170k tokens correspondingly. |
| Hardware Specification | Yes | Each model was trained on a mobile phone with a quad-core mobile CPU with a clock frequency 2.31 GHz. |
| Software Dependencies | No | The paper mentions "LSTM architecture from Zaremba et al. (2014)" and refers to neural network components, but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | The hyperparameters of the model were initially tuned on the Standard English validation set of 3.8M tokens. For our experiments we used LSTM architecture from Zaremba et al. (2014) with 2x650 LSTM layers, a vocabulary size of 30k, dropout 0.5, minibatch size 20, BPTT steps 35. We used a minibatch size 10, number of BPTT steps 20, learning rate 0.75 and 1 epoch. |