Estimation Beyond Data Reweighting: Kernel Method of Moments

Authors: Heiner Kremer, Yassine Nemmour, Bernhard Schölkopf, Jia-Jie Zhu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark our estimator on two different tasks against state-of-the-art estimators for conditional moment restrictions... Table 1 shows the mean squared error (MSE) of the parameter estimate for different methods and sample sizes. Our method provides a significantly lower MSE for small sample sizes and approaches the results of Deep GMM and FGEL for larger samples. ... Table 2 shows the MSE of the predicted models trained with different CMR estimation methods. We observe that our estimator consistently shows competitive performance and slightly outperforms the baselines on three out of four tasks.
Researcher Affiliation Academia Heiner Kremer 1 Yassine Nemmour 1 Bernhard Sch olkopf 1 2 Jia-Jie Zhu 3 1Max Planck Institute for Intelligent Systems, T ubingen, Germany 2Eidgen ossische Technische Hochschule Z urich, Switzerland 3Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany. Correspondence to: Heiner Kremer <hkremer@tue.mpg.de>.
Pseudocode Yes Algorithm 1 Gradient Descent Ascent for KMM Input: empirical distribution ˆPn, reference distribution ω, hyperparameters ϵ, λ, batchsizes n1, n2 while not converged do Sample {(xi, zi)}n1 i=1 ˆPn, {(xω j , zω j )}n2 j=1 ω G 1 n1n2 Pn1 i=1 Pn2 j=1 Gϵ,λ(θ, β; (xi, zi), (xω j , zω j )) β Ascent Step(β, βG) θ Descent Step(θ, θG) end while Output: Parameter estimate θ
Open Source Code Yes We release an implementation of our method as part of a software package for (conditional) moment restriction estimation. ... An implementation of our estimator and code to reproduce our results is available at https://github.com/Heiner Kremer/conditional-moment-restrictions.
Open Datasets No The paper describes the data-generating processes for the experiments (Heteroskedastic Instrumental Variable Regression and Neural Network Instrumental Variable Regression) but does not provide explicit access information (link, DOI, citation) to specific datasets used, even if the processes are synthetically generated.
Dataset Splits Yes We use training and validation sets of size n = 1000 and evaluate the prediction error on a test set of size 20000. ... Additionally, for our KMM esimator we represent the reference distribution ω by a kernel density estimator (KDE) trained on the empirical sample (see Section D.3) from which we sample mini-batches of size n2 = 200.
Hardware Specification No The paper does not specify any particular hardware (GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using Adam optimizer and RBF kernels but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For the variational methods we use an optimistic Adam (Daskalakis et al., 2018) implementation with a mini-batch size of n = 200 and a learning rate of τθ = 5 10 4 for optimization over θ and τh = 2.5 10 3 for optimization over h and β = (η, f, h) respectively. The regularization parameter λ for the instrument function h H is picked from λ [0, 10 4, 10 2, 1]. Specific to KMM we use n RF = 2000 random Fourier features and for every batch of size nbatch = 200 sampled from ˆPn we attach nreference = 200 samples from a reference distribution Q which we represent by a kernel density estimator with Gaussian kernel and bandwidth of σ = 0.1 trained on ˆPn. ... The entropy regularization parameter ϵ is picked from ϵ [0.1, 1, 10].