reproducibilityindex.ai

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Authors: Xiang Cheng, Yuxin Chen, Suvrit Sra

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To experimentally verify Proposition 3.4, we compare the performance of different choices of h against different choices of generating kernel K. We present our findings in Figures 1 and 2.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology 2University of California, Davis 3Technical University of Munich.
Pseudocode	No	The paper describes algorithms and derivations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about releasing open-source code or links to a code repository.
Open Datasets	No	The covariates x(i) are drawn iid from the unit sphere, and the labels y(i) are drawn from one of the three K-Gaussian Processes. We consider three choices of kernels: Klinear(u, v) = u, v , Krelu(u, v) = relu ( u, v ), and Kexp(u, v) = exp( u, v ) (as defined (11)).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits or cross-validation setup.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments.
Software Dependencies	No	The paper mentions using ADAM for training but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	Training Algorithm We train the Transformer using ADAM with gradient clipping. Each gradient step is computed from a minibatch of size 30000, and we resample the minibatch every 10 steps. All plots are averaged over 3 runs with different U (i.e. Σ) sampled each time, and different seeds for sampling training data.