Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Deep Frank-Wolfe For Neural Network Optimization
Authors: Leonard Berrada, Andrew Zisserman, M. Pawan Kumar
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments on the CIFAR and SNLI data sets, where we demonstrate the signi๏ฌcant superiority of our method over Adam, Adagrad, as well as the recently proposed BPGrad and AMSGrad. |
| Researcher Affiliation | Academia | Leonard Berrada1, Andrew Zisserman1 and M. Pawan Kumar1,2 1Department of Engineering Science University of Oxford 2Alan Turing Institute EMAIL |
| Pseudocode | Yes | The main steps of DFW are shown in Algorithm 1. |
| Open Source Code | Yes | The code is publicly available at https://github.com/oval-group/dfw. |
| Open Datasets | Yes | We present experiments on the CIFAR and SNLI data sets... The CIFAR-10/100 data sets contain 60,000 RGB natural images of size 32 32 with 10/100 classes (Krizhevsky, 2009). The Stanford Natural Language Inference (SNLI) data set is a large corpus of 570k pairs of sentences (Bowman et al., 2015). |
| Dataset Splits | Yes | We split the training set into 45,000 training samples and 5,000 validation samples, and use 10,000 samples for testing. |
| Hardware Specification | Yes | All models are trained on a single Nvidia Titan Xp card. |
| Software Dependencies | No | Our experiments are implemented in pytorch (Paszke et al., 2017), and the code is available at https://github.com/oval-group/dfw. While PyTorch is mentioned, a specific version number is not provided. |
| Experiment Setup | Yes | For all methods, we set a budget of 200 epochs for WRN and 300 epochs for DN. Furthermore, the batch-size is respectively set to 128 and 64 for WRN and DN as in (Zagoruyko & Komodakis, 2016) and (Huang et al., 2017). For DN, the l2 regularization is set to 10โ4 as in (Huang et al., 2017). For WRN, the l2 is cross-validated between 5.10โ4, as in (Zagoruyko & Komodakis, 2016), and 10โ4, a more usual value that we have found to perform better for some of the methods (in particular DFW, since the corresponding loss function is an SVM instead of CE, for which the value of 5.10โ4 was designed). The value of the Nesterov momentum is set to 0.9 for BPGrad, SGD and DFW. DFW has only one hyper-parameter to tune, namely ฮท, which is analogous to an initial learning rate. For SGD, the initial learning rate is set to 0.1 on both WRN and DN. |