reproducibilityindex.ai

Optimizing Data Usage via Differentiable Rewards

Authors: Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate two concrete instantiations of the DDS framework, one for a more general case of image classiﬁcation, and the other for a more speciﬁc case of neural machine translation (NMT). For image classiﬁcation, we test on both CIFAR-10 and Image Net. For NMT, we focus on a multilingual setting, where we optimize data usage from a multilingual corpus to improve the performance on a particular language. For these two very different and realistic tasks, we ﬁnd the DDS framework brings signiﬁcant improvements over diverse baselines for all settings.
Researcher Affiliation	Collaboration	1Language Technology Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA 2Google Research, Brain Team, Mountain View, CA 94043, USA.
Pseudocode	Yes	Alg. 1 presents the pseudo code for the training process on classiﬁcation tasks, using the notation introduced in 2. (...) The pseudo code of the training process is in Alg. 2.
Open Source Code	No	Code will be released soon.
Open Datasets	Yes	For image classiﬁcation, we use CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015). For multilingual NMT, we use the 58-language-to-English TED dataset (Qi et al., 2018).
Dataset Splits	Yes	For image classiﬁcation, we hold out 10% of the training data as Ddev; while for multilingual NMT, we simply use the dev set of the LRL as Ddev.
Hardware Specification	No	The paper states, 'The authors would like to thank Amazon for providing GPU credits,' but does not specify any particular GPU models, CPU types, or other hardware components used for running experiments.
Software Dependencies	No	The paper mentions specific optimizers and techniques like 'Adam optimizer' and 'batch normalization (Ioffe & Szegedy, 2015),' but it does not specify any software dependencies (e.g., Python, PyTorch, TensorFlow) with their version numbers.
Experiment Setup	Yes	For the NMT model, we use Adam optimizer with learning rate of 0.001. For the distribution parameter ψ, we use Adam optimizer with learning rate of 0.0001. (...) We train all models for 20 epochs without any learning rate decay. (...) The batch sizes for CIFAR-10 and for Image Net are 128 and 4096, running for 200K steps and 40K steps, respectively.