reproducibilityindex.ai

Coordinated Double Machine Learning

Authors: Nitai Fingerhut, Matteo Sesia, Yaniv Romano

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The improved empirical performance of the proposed method is demonstrated through numerical experiments on both simulated and real data.
Researcher Affiliation	Academia	1Departments of Electrical and Computer Engineering and of Computer Science, Technion, Israel 2Department of Data Sciences and Operations, University of Southern California, CA, USA.
Pseudocode	Yes	Algorithm 1 DML, Algorithm 2 C-DML, Algorithm 3 C-DML with ﬁxed (I1, I2, α, β, γ)
Open Source Code	Yes	A Python implementation of the methods described in this paper is available from https://github.com/ nitaifingerhut/C-DML.git, along with tutorials and code to reproduce the experiments.
Open Datasets	Yes	semi-synthetic numerical experiments based on ﬁnancial data borrowed from Chernozhukov & Hansen (2004); see Section 4 for more details. In particular, we borrow a data set from Chernozhukov & Hansen (2004) which has also been used by Chernozhukov et al. (2018). The Beijing air quality data set, presented in Zhang et al. (2017). The Facebook blog feedback data set of Buza (2014). The CCDDHNR2018 data set of Bach (2021) is integrated into Double ML Python package of Bach et al. (2022).
Dataset Splits	Yes	The observations are randomly divided into two disjoint subsets, I1, I2. To overcome this issue, Chernozhukov et al. (2018) suggested cross-ﬁtting. This is achieved by further splitting I2 into two disjoint subsets, I2,1, I2,2. In each experiment, the data are divided into three disjoint subsets, namely I1 (containing 50% of the observations), I2,1 (containing 25% of the observations), and I2,2 (containing 25% of the observations).
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	Random forest regression models are implemented using the Python package sklearn. The paper mentions deep neural networks but does not specify the framework (e.g., TensorFlow, PyTorch) or their versions.
Experiment Setup	Yes	The learning rate is ﬁxed to 0.01, clipping gradients with norms larger than 3. Early stopping is utilized to avoid overﬁtting; the number of epochs (capped at 2000) is tuned by evaluating the loss function on a hold-out data set. Random forest regression models are implemented using the Python package sklearn. The default hyper-parameters are utilized, except the number of trees in the forest and the maximal depth, both of which are set equal to 20.