Position: The Platonic Representation Hypothesis

Authors: Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we use a mutual nearest-neighbor metric that measures the mean intersection of the k-nearest neighbor sets induced by two kernels... We expand upon this observation by evaluating the transfer performance of 78 vision models. These models were trained with varying architectures, training objectives, and datasets (detailed in Appendix C.1).
Researcher Affiliation Academia Minyoung Huh * 1 Brian Cheung * 1 Tongzhou Wang * 1 Phillip Isola * 1 1MIT. Correspondence to: Minyoung Huh <minhuh@mit.edu>.
Pseudocode No The paper contains mathematical equations and descriptions of concepts but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code: github.com/minyoungg/platonic-rep
Open Datasets Yes We measure alignment among 78 models using mutual nearest-neighbors on Places-365 (Zhou et al., 2017), and evaluate their performance on downstream tasks from the Visual Task Adaptation Benchmark (VTAB; Zhai et al. (2019))... For vision and text, we use the Wikipedia captions dataset {(xi, yi)}i (Srinivasan et al., 2021)...
Dataset Splits Yes To reduce compute requirements, we subsample training and validation datasets to have at most 10,000 samples. We consider a representation solves a task if its performance is 80% of the best performance on that task across all 78 models.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions software tools like 'Py Torch Image Models (TIMM; Wightman (2021))' and 'Huggingface (Wolf et al., 2019)' but does not provide specific version numbers for these or other core software dependencies like PyTorch, Python, or CUDA.
Experiment Setup No The paper describes the models and datasets used for evaluation (e.g., '78 vision models', 'k = 10 nearest neighbors'), but it does not specify concrete experimental setup details such as hyperparameter values (learning rates, batch sizes, number of epochs, optimizer settings) that would be needed to reproduce any training processes related to its analysis.