On the Power of Foundation Models

Authors: Yang Yuan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Unlike most machine learning theory papers, our paper does not have any assumptions on the data distribution or network structure. Instead, we take the bird s-eye view that is model oblivious, and only focuses on the structure defined by the pretext task. It is indeed possible that by designing a special network, one may get a more powerful model with better performance. However, we stick with our setting because: Empirically, people do not customize network structures for different tasks. Instead, they tend to use similar structures like Res Net (He et al., 2016) or Transformer (Vaswani et al., 2017).
Researcher Affiliation Collaboration 1IIIS, Tsinghua University 2Shanghai Artificial Intelligence Laboratory 3Shanghai Qi Zhi Institute. Correspondence to: Yang Yuan <yuanyang@tsinghua.edu.cn>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper is theoretical and does not describe a new methodology for which source code would typically be provided or released. There are no statements or links regarding open-source code for the described theoretical framework.
Open Datasets No The paper is theoretical and does not perform experiments that involve training on specific public datasets. It refers to training in other research as context but does not conduct its own empirical training.
Dataset Splits No The paper is theoretical and does not perform experiments, thus it does not provide specific dataset split information for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any empirical experiments, thus it does not specify any hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not conduct experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe any empirical experiments, and therefore does not include details on experimental setup such as hyperparameters or training configurations.