Finito: A faster, permutable incremental gradient method for big data problems
Authors: Aaron Defazio, Justin Domke, Caetano
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We give empirical results showing state of the art performance. In this section we compare Finito, SAG, SDCA and LBFGS. |
| Researcher Affiliation | Academia | NICTA and Australian National University |
| Pseudocode | Yes | 2.2. The Finito algorithm. The step for iteration k, is as follows: 1. Update w using the step: w(k) = φ(k) 1 αsn i f i(φ(k) i ). 2. Pick an index j uniformly at random, or using without-replacement sampling as discussed in Section 3. 3. Set φ(k+1) j = w(k) in the table and leave the other variables the same (φ(k+1) i = φ(k) i for i = j). 4. Calculate and store f j(φ(k+1) j ) in the table. |
| Open Source Code | No | The paper does not provide any specific links or statements about the availability of source code for the described methodology. |
| Open Datasets | Yes | For classification, we tested on the ijcnn1 and covtype datasets1, as well as MNIST2 classifying 04 against 5-9. For regression, we choose the two datasets from the UCI repository: the million song year regression dataset, and the slice-localization dataset. 1http://www.csie.ntu.edu.tw/ cjlin/ libsvmtools/datasets/binary.html 2http://yann.lecun.com/exdb/mnist/ |
| Dataset Splits | No | The paper mentions 'The training portion of the datasets are of size...' for each dataset, but does not provide specific details on how the data was split into training, validation, and test sets, or the percentages/counts for each split. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers used in the experiments. |
| Experiment Setup | Yes | For Finito, we find that using α = 2 is the fastest rate when the big data condition holds for any β > 1. This is the step suggested by our theory when β = 2. Interestingly, reducing α to 1 does not improve the convergence rate. Instead we see no further improvement in our experiments. For Finito, during the first pass, since we do not have derivatives for each φi yet, we simply sum over the k terms seen so far i φ(k) i 1 αsk i f i(φ(k) i ), where we process data points in index order for the first pass only. |