Coordinated Double Machine Learning
Authors: Nitai Fingerhut, Matteo Sesia, Yaniv Romano
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The improved empirical performance of the proposed method is demonstrated through numerical experiments on both simulated and real data. |
| Researcher Affiliation | Academia | 1Departments of Electrical and Computer Engineering and of Computer Science, Technion, Israel 2Department of Data Sciences and Operations, University of Southern California, CA, USA. |
| Pseudocode | Yes | Algorithm 1 DML, Algorithm 2 C-DML, Algorithm 3 C-DML with fixed (I1, I2, α, β, γ) |
| Open Source Code | Yes | A Python implementation of the methods described in this paper is available from https://github.com/ nitaifingerhut/C-DML.git, along with tutorials and code to reproduce the experiments. |
| Open Datasets | Yes | semi-synthetic numerical experiments based on financial data borrowed from Chernozhukov & Hansen (2004); see Section 4 for more details. In particular, we borrow a data set from Chernozhukov & Hansen (2004) which has also been used by Chernozhukov et al. (2018). The Beijing air quality data set, presented in Zhang et al. (2017). The Facebook blog feedback data set of Buza (2014). The CCDDHNR2018 data set of Bach (2021) is integrated into Double ML Python package of Bach et al. (2022). |
| Dataset Splits | Yes | The observations are randomly divided into two disjoint subsets, I1, I2. To overcome this issue, Chernozhukov et al. (2018) suggested cross-fitting. This is achieved by further splitting I2 into two disjoint subsets, I2,1, I2,2. In each experiment, the data are divided into three disjoint subsets, namely I1 (containing 50% of the observations), I2,1 (containing 25% of the observations), and I2,2 (containing 25% of the observations). |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | Random forest regression models are implemented using the Python package sklearn. The paper mentions deep neural networks but does not specify the framework (e.g., TensorFlow, PyTorch) or their versions. |
| Experiment Setup | Yes | The learning rate is fixed to 0.01, clipping gradients with norms larger than 3. Early stopping is utilized to avoid overfitting; the number of epochs (capped at 2000) is tuned by evaluating the loss function on a hold-out data set. Random forest regression models are implemented using the Python package sklearn. The default hyper-parameters are utilized, except the number of trees in the forest and the maximal depth, both of which are set equal to 20. |