Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dataset Dynamics via Gradient Flows in Probability Space
Authors: David Alvarez-Melis, Nicolò Fusi
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through various experiments, we show that this framework can be used to impose constraints on classification datasets, adapt them for transfer learning, or to re-purpose fixed or blackbox models to classify with high accuracy previously unseen datasets. and 7. Experiments We first evaluate our approach for imposing constraints on low-dimensional synthetic datasets (Section 7.1) and then on two settings (Sections 7.2 & 7.3) involving transfer learning with benchmark image classification datasets. |
| Researcher Affiliation | Industry | David Alvarez-Melis 1 Nicol o Fusi 1 1Microsoft Research. Correspondence to: David Alvarez Melis <EMAIL>. |
| Pseudocode | No | The paper describes the numerical solution of gradient flows using mathematical equations (16) and (17) and explanatory text, but does not present a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We consider four classification datasets: MNIST (Le Cun et al., 2010), USPS, FASHIONMNIST (Xiao et al., 2017) and KMNIST (Clanuwat et al., 2018)... In addition to the *NIST datasets, we use CIFAR10, STL10 and the CAMELYON histopathology dataset (Litjens et al., 2018). |
| Dataset Splits | Yes | We use the standard MNIST, FMNIST, KMNIST, USPS splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications). |
| Software Dependencies | No | The paper mentions using PyTorch and the POT library but does not specify their version numbers, which is required for reproducibility. |
| Experiment Setup | Yes | Training is performed using ADAM with learning rate 1e-3, batch size 64, for 10 epochs. (from Appendix C). |