Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference
Authors: Tianyu Wang, Marco Morucci, M. Usaid Awan, Yameng Liu, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we study the quality and scalability of FLAME on synthetic and real data. The real datasets we use are the US Census 1990 dataset from the UCI Machine Learning Repository (Lichman, 2013) and the US 2010 Natality data (National Center for Health Statistics, NCHS, 2010). The bit-vector and SQL implementations are referred to as FLAME-bit and FLAME-db respectively. FLAME-bit was implemented using Python 2.7.13 and FLAME-db using Python, SQL, and Microsoft SQL Server 2016. We compared FLAME with several other (matching and non-matching) methods including: (1) one-to-one Propensity Score Nearest Neighbor Matching (1-PSNNM) (Ross et al., 2015), (2) 1-PSNNM with oracle variable selection, (3) Genetic Matching (Gen Match) (Diamond and Sekhon, 2013), (4) Causal Forest (Wager and Athey, 2018), (5) Mahalanobis Matching, (6) double linear regression, (7) BART (Chipman et al., 2010) and (8) CTMLE (Van Der Laan and Rubin, 2006). ... The computation time results are in Table 2a. The experiments were conducted on a Windows 10 machine with Intel(R) Core(TM) i7-6700 CPU processor (4 cores, 3.40GHz, 8M) and 32GB RAM. |
| Researcher Affiliation | Academia | Tianyu Wang EMAIL Marco Morucci EMAIL M. Usaid Awan EMAIL Yameng Liu EMAIL Sudeepa Roy EMAIL Cynthia Rudin EMAIL Alexander Volfovsky EMAIL Duke University |
| Pseudocode | Yes | Algorithm 1 : FLAME Algorithm Inputs Input data Sma p X, Y, Tq for matching; training set Str p Xtr, Y tr, T trq; model classes F1, F2, , Fd; stopping threshold ϵ; tradeoffparameter C. Outputs A sequence of selection indicators θ0, , θd, and a set of matched groups t MGpθl, Slqulě1. Ź Sl is defined in the algorithm. 1: Initialize S0 Sma p X, Y, Tq, θ0 1dˆ1, l 1, run True. Ź l is the index for iterations. 2: Compute exact matched groups MGpθ0, S0q as defined in (1). Ź The detailed implementation is in Section 4. 3: while run True do 4: Compute θl using (6) on training set Str, using Fd l and tradeoffparameter C. Ź Determine which covariates to match on for this iteration. 5: Compute matched groups MGpθl 1, Sl 1q as defined in (1). Ź The detailed implementation is in Section 4. 6: Sl Sl 1z MGpθl 1, Sl 1q. Ź These matched units are done. 7: if ˆPEFd lpθl, Strq ą ˆPEFdp1dˆ1, Strq ϵ OR Sl H then 8: run False Ź Prediction error is too high to continue matching. 10: Output tθl, MGpθl, Slqulě1. |
| Open Source Code | Yes | The code for FLAME is available at https://cran.r-project.org/web/packages/ FLAME/index.html (in R), and https://www.github.com/almost-matching-exactly/ (in Python). An introduction to the project with links to the code is also found at https: //almost-matching-exactly.github.io/. Gupta et al. (2021) provides a short overview of the FLAME-DAME software package. |
| Open Datasets | Yes | The real datasets we use are the US Census 1990 dataset from the UCI Machine Learning Repository (Lichman, 2013) and the US 2010 Natality data (National Center for Health Statistics, NCHS, 2010). |
| Dataset Splits | Yes | We consider 20,000 units (10,000 control and 10,000 treated) generated with (9)... After eliminating units whose outcomes and/or treatment indicators are missing, there were 2.1M units, among which 75K units are treated units. ... We randomly sampled 10% of these units (122,089 units) as the training set. |
| Hardware Specification | Yes | The experiments were conducted on a Windows 10 machine with Intel(R) Core(TM) i7-6700 CPU processor (4 cores, 3.40GHz, 8M) and 32GB RAM. |
| Software Dependencies | Yes | FLAME-bit was implemented using Python 2.7.13 and FLAME-db using Python, SQL, and Microsoft SQL Server 2016. ... The other implementation leverages database management systems (e.g., Postgre SQL, 2016) |
| Experiment Setup | Yes | Four of the simulated experiments use data generated from special cases of the following (treatment T P t0, 1u): i 1 βixi T U ÿ 1ďiăγď5 xixγ ϵ, (9) Here, αi Np10s, 1q with s Uniformt 1, 1u, βi Np1.5, 0.15q, U is a constant and ϵ Np0, 0.1q. This contains linear baseline effects and treatment effects, and a quadratic treatment effect term. ... We consider 20,000 units (10,000 control and 10,000 treated) generated with (9) where U 1. ... In this experiment, Figure 1a is generated by stopping when the PE drops (starting from values within [-2, -1]) to below -20, resulting in more than 15,000 matches out of 20,000 units. |