Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Circuit-Based Intrinsic Methods to Detect Overfitting
Authors: Satrajit Chatterjee, Alan Mishchenko
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, CFS can separate models with different levels of overfit using only their logic circuit representations without any access to the high level structure. We take a first step towards answering this question by studying a naturally-motivated family of intrinsic methods, called Counterfactual Simulation (CFS), and evaluating their efficacy experimentally on a benchmark problem. |
| Researcher Affiliation | Collaboration | Satrajit Chatterjee 1 Google, Mountain View, California, USA 2 Department of EECS, University of California, Berkeley, California, USA. |
| Pseudocode | No | The paper describes the Counterfactual Simulation (CFS) method in prose within Section 2, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | Yes | To make this concrete, consider the MNIST image classification problem (Le Cun & Cortes, 2010)... S is a sample from D of 60,000 images xi and their corresponding labels yi (thus, 0 i < 60000) i.e., the MNIST training set. We also performed some experiments with Fashion MNIST (Xiao et al., 2017) and the results are similar. |
| Dataset Splits | No | The paper mentions "validation set accuracy" (e.g., "nn-real-2 is the least overfit and gets to a validation set accuracy of 97%"), but it does not provide specific details on the dataset split (e.g., exact percentages or sample counts for the validation set) or how the validation set was created. |
| Hardware Specification | Yes | A typical run of l-CFS in our experiments takes less than 10 minutes on a 3.7GHz Xeon CPU and less than 2GB of RAM. |
| Software Dependencies | Yes | Two random forests were trained using version 0.19.1 of Scikitlearn (Pedregosa et al., 2011). |
| Experiment Setup | Yes | In all cases, we used the ADAM optimizer with default parameters and batch size of 64. Weights and activations are represented by signed 8-bit and 16-bit fixed point numbers respectively with 6 bits reserved for the fractional part. (Weights from training are clamped to [ 2.0, 2.0) before conversion to fixed point.) Each multiply-accumulate unit multiplies an 8-bit constant (the weight) with a 16-bit input (the activation) and accumulates in 24 bits with saturation. |