Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients

Authors: Yuri Fonseca, Caio Peixoto, Yuri Saporito

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are reported on Section 5, where our algorithm is compared to state-of-the-art machine learning methods for NPIV. For each method and response function, we evaluated predictions over 20 realizations of the data. Log mean squared error (MSE) box plots and plots of each method s estimator for a randomly chosen realization of the data are displayed in Figure 1.
Researcher Affiliation Academia Yuri R. Fonseca Decision, Risk and Operations Columbia University New York, NY yfonseca23@gsb.columbia.edu Caio F. L. Peixoto School of Applied Mathematics Getulio Vargas Foundation Rio de Janeiro, RJ, Brazil caio.peixoto@fgv.br Yuri F. Saporito School of Applied Mathematics Getulio Vargas Foundation Rio de Janeiro, RJ yuri.saporito@fgv.br
Pseudocode Yes Algorithm 1 SAGD IV
Open Source Code Yes Code for the experiments is available at https://github.com/Caioflp/sagd-iv
Open Datasets Yes To study the performance of our estimator in a continuous response setting, we used the data generating process from [6], which we recall below: Y = h (X) + ε + δ, X = Z1 + ε + γ, (11) Z = (Z1, Z2) Uniform([ 3, 3]2), ε N(0, 1), γ, δ N(0, 0.1).
Dataset Splits Yes Instead of using 50% of the samples for training and 50% for validation, we adopted a train / validation split of 80 / 20.
Hardware Specification Yes All experiments were conducted on a Apple Sillicon M1 Pro CPU. No GPUs were used.
Software Dependencies No The paper mentions specific software components like 'Adam optimizer' but does not provide version numbers for these or other software dependencies.
Experiment Setup Yes The Algorithm 1 learning rate was set to be αm = 1/ M for 1 m M; The warm up time K was set to 100; The set H was chosen as in Equation (6), with A set to 10. For both networks in the roles of bΦ and br, we used two hidden dense layers of sizes 64 and 32. In the continuous response setting the activation function used was a Re LU, while for the binary response one we used the sigmoid activation. For bΦ, we used dropout with rate 0.01. Both networks were trained with a batch size of 512, using the Adam [21] optimizer with learning rate of 0.01, accross 1.5 105/N epochs. We adopted L2 regularization in the two estimators, with regularization parameters of 3 10 3 for br and 5 10 3 for bΦ. We also implemented early stopping in the two training procedures.