reproducibilityindex.ai

Distributed Fine-tuning of Language Models on Private Data

Authors: Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov, Alex Nevidomsky

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study approaches to distributed ﬁne-tuning of a general model on user private data with the additional requirements of maintaining the quality on the general data and minimization of communication costs. We propose a novel technique that signiﬁcantly improves prediction quality on users language compared to a general model and outperforms gradient compression methods in terms of communication efﬁciency. The proposed procedure is fast and leads to an almost 70% perplexity reduction and 8.7 percentage point improvement in keystroke saving rate on informal English texts. Finally, we propose an experimental framework for evaluating differential privacy of distributed training of language models and show that our approach has good privacy guarantees. Table 1 summarizes our experiments with on-device model update algorithms.
Researcher Affiliation	Industry	Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov & Alex Nevidomsky Samsung R&D Institute Russia Moscow, Russia v.popov@samsung.com,m.kudinov@samsung.com, p.irina@samsung.com,p.vytovtov@partner.samsung.com, a.nevidomsky@samsung.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks. Figure 1 provides an "Overview of the approach" which is a diagram.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	No	The paper mentions "Twitter and Wikipedia corpora for the user and standard English corpora correspondingly." and "The standard English train dataset contained approximately 30M tokens. The user train dataset contained approximately 1.7M tokens." While these are publicly known datasets, the paper does not provide specific links, DOIs, or citations to the exact versions or subsets used that would allow concrete access for reproduction.
Dataset Splits	Yes	The hyperparameters of the model were initially tuned on the Standard English validation set of 3.8M tokens. Updated models were tested on subsets of the Twitter and Wikipedia corpora containing 200k and 170k tokens correspondingly.
Hardware Specification	Yes	Each model was trained on a mobile phone with a quad-core mobile CPU with a clock frequency 2.31 GHz.
Software Dependencies	No	The paper mentions "LSTM architecture from Zaremba et al. (2014)" and refers to neural network components, but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	The hyperparameters of the model were initially tuned on the Standard English validation set of 3.8M tokens. For our experiments we used LSTM architecture from Zaremba et al. (2014) with 2x650 LSTM layers, a vocabulary size of 30k, dropout 0.5, minibatch size 20, BPTT steps 35. We used a minibatch size 10, number of BPTT steps 20, learning rate 0.75 and 1 epoch.