The Lingering of Gradients: How to Reuse Gradients Over Time

Authors: Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the empirical side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module application with 4.6m users to 10 6 error (or 10 12 dual error) using 6 passes of the dataset.
Researcher Affiliation Collaboration Zeyuan Allen-Zhu Microsoft Research AI Redmond, WA 98052 zeyuan@csail.mit.edu David Simchi-Levi MIT Cambridge, MA 02139 dslevi@mit.edu Xinshang Wang MIT Cambridge, MA 02139 xinshang@mit.edu
Pseudocode Yes Algorithm 1 GDlin(f, x(0), S, C, D) Input: f(x) = 1 n Pn i=1 fi(x) convex and L-smooth, starting vector x(0) Rd, number of epochs S 1, parameters C, D > 0. Output: vector x Rd. 1: for s 1 to S do 2: x0 x(s 1); m 1 + C2 16D2 s ; and ξ C 3: g 0 and gi 0 for each i [n]. 4: for k 0 to m 1 do 5: Calculate Λk [n] from x0, . . . , xk according to Definition 3.1. 6: for i Λk do 7: g g + fi(xk) gi n and gi fi(xk). 8: xk+1 xk min ξ g , 1 L g it satisfies g = f(xk) 9: x(s) xm; 10: return x = x(S).
Open Source Code No The paper provides a link to its full version on arXiv (https://arxiv.org/abs/1901.02871), which hosts research papers, but it does not provide an explicit statement or link for the open-source code of the methodology described in the paper.
Open Datasets Yes We construct a revenue maximization LP (2.1) using the publicly accessible dataset of Yahoo! Front Page Today Module [6, 22].
Dataset Splits No The paper states that it uses the 'publicly accessible dataset of Yahoo! Front Page Today Module' and mentions that 'details of the experimental setup' are in the full version. However, the provided text does not specify exact train/validation/test splits, percentages, or sample counts.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions using and comparing against methods like SVRG and SAGA, but it does not list any specific software dependencies or libraries with version numbers (e.g., Python, PyTorch, TensorFlow, or specific solvers).
Experiment Setup Yes In Figure 3(a), the legend shows specific learning rates used for the SVRG and SAGA methods (e.g., 'SVRG:0.0001 SVRG:0.0003 SVRG:0.0005'), indicating concrete hyperparameter values. Additionally, it states, 'We choose θ = 5' for the lingering radius calculation, and 'm = 2n in practice' for SVRG.