Circuit-Based Intrinsic Methods to Detect Overfitting
Authors: Satrajit Chatterjee, Alan Mishchenko
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, CFS can separate models with different levels of overfit using only their logic circuit representations without any access to the high level structure. We take a first step towards answering this question by studying a naturally-motivated family of intrinsic methods, called Counterfactual Simulation (CFS), and evaluating their efficacy experimentally on a benchmark problem. |
| Researcher Affiliation | Collaboration | Satrajit Chatterjee 1 Google, Mountain View, California, USA 2 Department of EECS, University of California, Berkeley, California, USA. |
| Pseudocode | No | The paper describes the Counterfactual Simulation (CFS) method in prose within Section 2, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | Yes | To make this concrete, consider the MNIST image classification problem (Le Cun & Cortes, 2010)... S is a sample from D of 60,000 images xi and their corresponding labels yi (thus, 0 i < 60000) i.e., the MNIST training set. We also performed some experiments with Fashion MNIST (Xiao et al., 2017) and the results are similar. |
| Dataset Splits | No | The paper mentions "validation set accuracy" (e.g., "nn-real-2 is the least overfit and gets to a validation set accuracy of 97%"), but it does not provide specific details on the dataset split (e.g., exact percentages or sample counts for the validation set) or how the validation set was created. |
| Hardware Specification | Yes | A typical run of l-CFS in our experiments takes less than 10 minutes on a 3.7GHz Xeon CPU and less than 2GB of RAM. |
| Software Dependencies | Yes | Two random forests were trained using version 0.19.1 of Scikitlearn (Pedregosa et al., 2011). |
| Experiment Setup | Yes | In all cases, we used the ADAM optimizer with default parameters and batch size of 64. Weights and activations are represented by signed 8-bit and 16-bit fixed point numbers respectively with 6 bits reserved for the fractional part. (Weights from training are clamped to [ 2.0, 2.0) before conversion to fixed point.) Each multiply-accumulate unit multiplies an 8-bit constant (the weight) with a 16-bit input (the activation) and accumulates in 24 bits with saturation. |