reproducibilityindex.ai

Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients

Authors: Yuri Fonseca, Caio Peixoto, Yuri Saporito

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments are reported on Section 5, where our algorithm is compared to state-of-the-art machine learning methods for NPIV. For each method and response function, we evaluated predictions over 20 realizations of the data. Log mean squared error (MSE) box plots and plots of each method s estimator for a randomly chosen realization of the data are displayed in Figure 1.
Researcher Affiliation	Academia	Yuri R. Fonseca Decision, Risk and Operations Columbia University New York, NY yfonseca23@gsb.columbia.edu Caio F. L. Peixoto School of Applied Mathematics Getulio Vargas Foundation Rio de Janeiro, RJ, Brazil caio.peixoto@fgv.br Yuri F. Saporito School of Applied Mathematics Getulio Vargas Foundation Rio de Janeiro, RJ yuri.saporito@fgv.br
Pseudocode	Yes	Algorithm 1 SAGD IV
Open Source Code	Yes	Code for the experiments is available at https://github.com/Caioflp/sagd-iv
Open Datasets	Yes	To study the performance of our estimator in a continuous response setting, we used the data generating process from [6], which we recall below: Y = h (X) + ε + δ, X = Z1 + ε + γ, (11) Z = (Z1, Z2) Uniform([ 3, 3]2), ε N(0, 1), γ, δ N(0, 0.1).
Dataset Splits	Yes	Instead of using 50% of the samples for training and 50% for validation, we adopted a train / validation split of 80 / 20.
Hardware Specification	Yes	All experiments were conducted on a Apple Sillicon M1 Pro CPU. No GPUs were used.
Software Dependencies	No	The paper mentions specific software components like 'Adam optimizer' but does not provide version numbers for these or other software dependencies.
Experiment Setup	Yes	The Algorithm 1 learning rate was set to be αm = 1/ M for 1 m M; The warm up time K was set to 100; The set H was chosen as in Equation (6), with A set to 10. For both networks in the roles of bΦ and br, we used two hidden dense layers of sizes 64 and 32. In the continuous response setting the activation function used was a Re LU, while for the binary response one we used the sigmoid activation. For bΦ, we used dropout with rate 0.01. Both networks were trained with a batch size of 512, using the Adam [21] optimizer with learning rate of 0.01, accross 1.5 105/N epochs. We adopted L2 regularization in the two estimators, with regularization parameters of 3 10 3 for br and 5 10 3 for bΦ. We also implemented early stopping in the two training procedures.