Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Contextures: Representations from Contexts

Authors: Runtian Zhai, Kai Yang, Burak Varıcı, Che-Ping Tsai, J Zico Kolter, Pradeep Kumar Ravikumar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We substantiate this extrapolation with an experiment on the abalone dataset from Open ML. We compare two d-dimensional representations... We now conduct an experiment that examines τd on two datasets. First, we use the abalone dataset... Second, we use the MNIST dataset... In Table 1 we report the correlation between τ and errd over all 140 contexts from the 4 types on 28 classification (Cls) and regression (Reg) datasets from Open ML.
Researcher Affiliation	Academia	1Carnegie Mellon University, Pittsburgh, PA, USA 2Peking University, Beijing, China.
Pseudocode	No	The paper describes a procedure in Appendix D, stating 'This can be efficiently done with the following procedure: (i) Train an encoder Φ... (ii) Estimate the covariance matrix... (iii) Estimate BΦ... (iv) Solve the generalized eigenvalue problem...', but it is not formatted as a distinct pseudocode or algorithm block with a specific label.
Open Source Code	Yes	The code for this paper can be found at https: //colab.research.google.com/drive/ 1Gd J0Yn-PKi Kfk ZIw Uuon3Wp Tpb NWEt AO?usp= sharing.
Open Datasets	Yes	We substantiate this extrapolation with an experiment on the abalone dataset from Open ML. We use the MNIST dataset. In Table 1 we report the correlation between τ and errd over all 140 contexts from the 4 types on 28 classification (Cls) and regression (Reg) datasets from Open ML (Vanschoren et al., 2013).
Dataset Splits	Yes	We use the abalone dataset from Open ML, and split the dataset into a pretrain set, a downstream train set and a downstream test set by 70%-15%-15%.
Hardware Specification	No	The paper does not explicitly describe any specific hardware used to run its experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions 'Adam W optimizer (Kingma & Ba, 2015; Loshchilov & Hutter, 2017)' and 'VICReg (Bardes et al., 2022)' but does not provide specific version numbers for these or any other software components like programming languages or libraries.
Experiment Setup	Yes	The embedding dimension is set to d = 128. For the second encoder, we train a fully-connected neural network with Tanh activation and skip connections for a sufficient number of steps with full-batch Adam W... For each width and depth, we run the experiments 15 times with different random initializations. We set the output dimension of the neural network to be d1 = 512. The downstream linear predictor is fit via ridge regression. Hyperparameter grid search is conducted at both encoder learning and downstream stages. We choose β = 1 and d0 = 512 in our experiments.