Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Position: The Platonic Representation Hypothesis
Authors: Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we use a mutual nearest-neighbor metric that measures the mean intersection of the k-nearest neighbor sets induced by two kernels... We expand upon this observation by evaluating the transfer performance of 78 vision models. These models were trained with varying architectures, training objectives, and datasets (detailed in Appendix C.1). |
| Researcher Affiliation | Academia | Minyoung Huh * 1 Brian Cheung * 1 Tongzhou Wang * 1 Phillip Isola * 1 1MIT. Correspondence to: Minyoung Huh <EMAIL>. |
| Pseudocode | No | The paper contains mathematical equations and descriptions of concepts but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: github.com/minyoungg/platonic-rep |
| Open Datasets | Yes | We measure alignment among 78 models using mutual nearest-neighbors on Places-365 (Zhou et al., 2017), and evaluate their performance on downstream tasks from the Visual Task Adaptation Benchmark (VTAB; Zhai et al. (2019))... For vision and text, we use the Wikipedia captions dataset {(xi, yi)}i (Srinivasan et al., 2021)... |
| Dataset Splits | Yes | To reduce compute requirements, we subsample training and validation datasets to have at most 10,000 samples. We consider a representation solves a task if its performance is 80% of the best performance on that task across all 78 models. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions software tools like 'Py Torch Image Models (TIMM; Wightman (2021))' and 'Huggingface (Wolf et al., 2019)' but does not provide specific version numbers for these or other core software dependencies like PyTorch, Python, or CUDA. |
| Experiment Setup | No | The paper describes the models and datasets used for evaluation (e.g., '78 vision models', 'k = 10 nearest neighbors'), but it does not specify concrete experimental setup details such as hyperparameter values (learning rates, batch sizes, number of epochs, optimizer settings) that would be needed to reproduce any training processes related to its analysis. |