The Lingering of Gradients: How to Reuse Gradients Over Time
Authors: Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the empirical side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module application with 4.6m users to 10 6 error (or 10 12 dual error) using 6 passes of the dataset. |
| Researcher Affiliation | Collaboration | Zeyuan Allen-Zhu Microsoft Research AI Redmond, WA 98052 zeyuan@csail.mit.edu David Simchi-Levi MIT Cambridge, MA 02139 dslevi@mit.edu Xinshang Wang MIT Cambridge, MA 02139 xinshang@mit.edu |
| Pseudocode | Yes | Algorithm 1 GDlin(f, x(0), S, C, D) Input: f(x) = 1 n Pn i=1 fi(x) convex and L-smooth, starting vector x(0) Rd, number of epochs S 1, parameters C, D > 0. Output: vector x Rd. 1: for s 1 to S do 2: x0 x(s 1); m 1 + C2 16D2 s ; and ξ C 3: g 0 and gi 0 for each i [n]. 4: for k 0 to m 1 do 5: Calculate Λk [n] from x0, . . . , xk according to Definition 3.1. 6: for i Λk do 7: g g + fi(xk) gi n and gi fi(xk). 8: xk+1 xk min ξ g , 1 L g it satisfies g = f(xk) 9: x(s) xm; 10: return x = x(S). |
| Open Source Code | No | The paper provides a link to its full version on arXiv (https://arxiv.org/abs/1901.02871), which hosts research papers, but it does not provide an explicit statement or link for the open-source code of the methodology described in the paper. |
| Open Datasets | Yes | We construct a revenue maximization LP (2.1) using the publicly accessible dataset of Yahoo! Front Page Today Module [6, 22]. |
| Dataset Splits | No | The paper states that it uses the 'publicly accessible dataset of Yahoo! Front Page Today Module' and mentions that 'details of the experimental setup' are in the full version. However, the provided text does not specify exact train/validation/test splits, percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using and comparing against methods like SVRG and SAGA, but it does not list any specific software dependencies or libraries with version numbers (e.g., Python, PyTorch, TensorFlow, or specific solvers). |
| Experiment Setup | Yes | In Figure 3(a), the legend shows specific learning rates used for the SVRG and SAGA methods (e.g., 'SVRG:0.0001 SVRG:0.0003 SVRG:0.0005'), indicating concrete hyperparameter values. Additionally, it states, 'We choose θ = 5' for the lingering radius calculation, and 'm = 2n in practice' for SVRG. |