reproducibilityindex.ai

Improving Adaptivity via Over-Parameterization in Sequence Models

Authors: Yicheng Li, Qian Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide some numerical experiments to validate the theoretical results. For more detailed numerical experiments, please refer to Section C.
Researcher Affiliation	Academia	Yicheng Li Department of Statistics and Data Science Tsinghua University, Beijing, China liyc22@mails.tsinghua.edu.cn Qian Lin Department of Statistics and Data Science Tsinghua University, Beijing, China qianlin@tsinghua.edu.cn Corresponding author Qian Lin also affiliates with Beijing Academy of Artificial Intelligence, Beijing, China
Pseudocode	No	We approximate the gradient flow equation (22) and (30) by discrete-time gradient descent and truncate the sequence model to the first N terms for some very large N.
Open Source Code	Yes	The codes are provided in the supplementary material.
Open Datasets	No	We consider the settings as in Corollary 3.3 that θ is given by (4) for some p > 0 and q 1. We set ϵ2 = n 1, where n can be regarded as the sample size, and consider the asymptotic performance of the generalization error as n grows. [...] We consider the two real-world datasets: California Housing and Concrete Compressive Strength.
Dataset Splits	No	No explicit mention of training/validation/test dataset splits is found. The paper focuses on generalization error related to training process and sample size.
Hardware Specification	Yes	The experiments can be done by a 64 CPU core laptop with 32 GB memory in one day.
Software Dependencies	No	The paper mentions 'discrete-time gradient descent' and implies computation, but does not specify software names with version numbers for libraries or programming languages used.
Experiment Setup	Yes	We approximate the gradient flow equation (22) and (30) by discrete-time gradient descent with sufficiently small step size. Moreover, we truncate the sequence model to the first N terms for some very large N. We consider the settings as in Corollary 3.3 that θ is given by (4) for some p > 0 and q 1. We set ϵ2 = n 1, where n can be regarded as the sample size, and consider the asymptotic performance of the generalization error as n grows. For the stopping time, we choose the oracle one that minimizes the generalization error for each method.