Counterfactual Density Estimation using Kernel Stein Discrepancies

Authors: Diego Martinez-Taboada, Edward Kennedy

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we present a novel estimator for modelling counterfactual distributions given a parametric class of distributions, along with its theoretical analysis. Second, we illustrate the empirical performance of the estimator in a variety of scenarios. and 5 EXPERIMENTS We provide a number of experiments with (semi)synthetic data.
Researcher Affiliation Academia Diego Martinez-Taboada Department of Statistics & Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA diegomar@andrew.cmu.edu Edward H. Kennedy Department of Statistics & Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA edward@stat.cmu.edu
Pseudocode Yes Algorithm 1 DR-MKSD
Open Source Code Yes Reproducible code for all experiments is provided in the supplementary materials.
Open Datasets Yes We start by training a dense neural network with layers of size [784, 100, 20, 10] on the MNIST train dataset.
Dataset Splits Yes For this, we minimize the log entropy loss on 80% of such train data and we store the parameters that minimize the log entropy loss for the remaining validation data (remaining 20% of the MNIST train dataset).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models, memory, or specific cloud instance types.
Software Dependencies No The paper mentions using 'scikit-learn package' for various classifiers (Logistic Regression, Ada Boost, Random Forest) but does not provide specific version numbers for these packages or for Python.
Experiment Setup Yes Parameter θ was estimated by gradient descent for a number of 1000 steps. (Appendix B.1, B.2) and We estimate the minimizer of gn(θ) by gradient descent. (Section 5) and Figure 3 exhibits the values of gn over a grid {(θ1, θ2) : θ1, θ2 { 5, 4.9, 4.8 . . . , 5}}. (Appendix B.3) and π with Default Logistic Regression from the scikit-learn package with C = 1e5 and max iter = 1000 (Appendix B.1).