Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Geometry-Aware Adaptation for Pretrained Models
Authors: Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, using easily-available external metrics, our proposed approach, LOKI, gains up to 29.7% relative improvement over Sim CLR on Image Net and scales to hundreds of thousands of classes. When no such metric is available, LOKI can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP. |
| Researcher Affiliation | Academia | Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala University of Wisconsin-Madison EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Locus cover for phylogenetic trees, Algorithm 2 Computing a pairwise decomposable locus, Algorithm 3 Computing a generic locus |
| Open Source Code | Yes | Code implementing all of our experiments is available here: https://github.com/Sprocket Lab/loki. |
| Open Datasets | Yes | We evaluate the capability of LOKI to improve upon zero-shot models where all classes are observed. Setup Our experiment compares the zero-shot prediction performance of CLIP [32] on CIFAR-100 [20] to CLIP logits used with LOKI...For Image Net, we use the Word Net phylogenetic tree as the metric space [2]...For Pub Med, we derive our metric from Euclidean distances between Sim CSE class embeddings [11]. Finally for LSHTC, we summarize the default graph by randomly selecting nodes and merging them with their neighbors until we obtain a graph with 10,000 supernodes representing sets of classes. |
| Dataset Splits | Yes | To construct our datasets, we randomly sample 50 images for each class from Image Net as our training dataset then use the validation dataset in Image Net to evaluate LOKI s performance. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'CLIP frozen weights' and 'Sim CLRv1' but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | No | The paper mentions some general settings like 'no training and hyperparameters involved in experiments involving CLIP, except for the Softmax temperature in the calibration analysis' and using a '5-NN model' for LSHTC, but it lacks specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations typically found in a reproducible experimental setup section. |