Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Be Like Water: Adaptive Floating Point for Machine Learning

Authors: Thomas Yeh, Max Sterner, Zerlina Lai, Brandon Chuang, Alexander Ihler

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AFP on a spectrum of representative models in computer vision and NLP, and show that our technique enables ultra-low precision inference of deep learning models while providing accuracy comparable to full precision inference. We build a simulation infrastructure in Tensorflow to accurately model the numerical effects of applying AFP to the weights and layer outputs of ML models. We perform comprehensive simulations of AFP on a wide range of robust CNN and Transformer models.
Researcher Affiliation	Academia	1Computer Science Department, Pomona College, Claremont, CA, USA 2Computer Science Department, Occidental College, Los Angeles, CA, USA 3Computer Science Department, University of California, Santa Cruz, CA, USA 4Department of Computer Science, University of California, Irvine, CA, USA.
Pseudocode	No	The paper describes the design of AFP and its hardware implementation but does not include formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available.
Open Datasets	Yes	For image classification, the test dataset of Image Net V2 (Recht et al., 2019) was used to measure model accuracy.
Dataset Splits	No	The paper mentions using a "test dataset" but does not provide specific details on training, validation, and test splits (e.g., percentages, sample counts, or explicit references to predefined splits for all sets).
Hardware Specification	No	The paper refers to types of ML accelerators (e.g., "Google s TPU", "Nvidia Tensor Cores") in a general context but does not specify the exact hardware (e.g., GPU/CPU models, memory) used for their experiments.
Software Dependencies	No	To simulate AFP in hardware, several DNN inference models were employed in Tensorflow using the Keras (Chollet et al., 2015) and Hugging Face (Wolf et al., 2019) libraries. However, specific version numbers for these software components are not provided.
Experiment Setup	Yes	Using a custom round function, all types of layers weights and outputs were rounded with AFP, such as Conv2D, Batch Normalization, and Dense layers. To properly simulate inference using AFP in hardware, all the weights were rounded when instantiating the model and all layer outputs were rounded between every layer, before being input into the next layer. Examples from data sets were individually input into each model and a block size of 16 was used for most experiments.