Circuit-Based Intrinsic Methods to Detect Overfitting

Authors: Satrajit Chatterjee, Alan Mishchenko

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, CFS can separate models with different levels of overfit using only their logic circuit representations without any access to the high level structure. We take a first step towards answering this question by studying a naturally-motivated family of intrinsic methods, called Counterfactual Simulation (CFS), and evaluating their efficacy experimentally on a benchmark problem.
Researcher Affiliation Collaboration Satrajit Chatterjee 1 Google, Mountain View, California, USA 2 Department of EECS, University of California, Berkeley, California, USA.
Pseudocode No The paper describes the Counterfactual Simulation (CFS) method in prose within Section 2, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the methodology described, nor does it include a link to a code repository.
Open Datasets Yes To make this concrete, consider the MNIST image classification problem (Le Cun & Cortes, 2010)... S is a sample from D of 60,000 images xi and their corresponding labels yi (thus, 0 i < 60000) i.e., the MNIST training set. We also performed some experiments with Fashion MNIST (Xiao et al., 2017) and the results are similar.
Dataset Splits No The paper mentions "validation set accuracy" (e.g., "nn-real-2 is the least overfit and gets to a validation set accuracy of 97%"), but it does not provide specific details on the dataset split (e.g., exact percentages or sample counts for the validation set) or how the validation set was created.
Hardware Specification Yes A typical run of l-CFS in our experiments takes less than 10 minutes on a 3.7GHz Xeon CPU and less than 2GB of RAM.
Software Dependencies Yes Two random forests were trained using version 0.19.1 of Scikitlearn (Pedregosa et al., 2011).
Experiment Setup Yes In all cases, we used the ADAM optimizer with default parameters and batch size of 64. Weights and activations are represented by signed 8-bit and 16-bit fixed point numbers respectively with 6 bits reserved for the fractional part. (Weights from training are clamped to [ 2.0, 2.0) before conversion to fixed point.) Each multiply-accumulate unit multiplies an 8-bit constant (the weight) with a 16-bit input (the activation) and accumulates in 24 bits with saturation.