reproducibilityindex.ai

What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features

Authors: Pasha Khosravi, Yitao Liang, YooJung Choi, Guy Van den Broeck

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations show that our model achieves the same performance as the logistic regression with all features observed, and outperforms standard imputation techniques when features go missing during prediction time.In this section, we empirically evaluate the performance of naive conformant learning (Na CL) and provide a detailed discussion of our method s advantages over existing imputation approaches in practice.
Researcher Affiliation	Academia	Pasha Khosravi , Yitao Liang , Yoo Jung Choi and Guy Van den Broeck University of California, Los Angeles {pashak, yliang, yjchoi, guyvdb}@cs.ucla.edu
Pseudocode	No	The paper describes its approach (e.g., Naive Conformant Learning) as an algorithm based on geometric programming but does not provide pseudocode or a formally labeled algorithm block.
Open Source Code	Yes	Our implementation of the algorithm and experiments are available at https://github.com/UCLA-Star AI/Na CL.
Open Datasets	Yes	To demonstrate the generality of our method, we construct a 5-dataset testbed suite that covers assorted conﬁgurations [Yann et al., 2009; Xiao et al., 2017; Blackard and Dean, 1999; Dua and Karra Taniskidou, 2017; Noordewier et al., 1991]; see Table 2. Table 2: Summary of our testbed. MNIST, FASHION, COVTYPE, ADULT, SPLICE
Dataset Splits	No	For datasets with no predeﬁned test set, we construct one by a 80 : 20 split. (Only a train/test split is explicitly mentioned, not a separate validation split.)
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory, or specific cloud instances) used to run the experiments.
Software Dependencies	No	We used the GPkit6 library to solve our geometric programs. (No version number is provided for GPkit.)
Experiment Setup	No	As our method assumes binary inputs, we transform categorical features through one-hot encodings and binarize continuous ones based on whether they are 0.05 standard deviation above their respective mean. Our algorithm takes as input a logistic regression model which we trained using fully observed training data. During prediction time, we make the features go missing uniformly at random based on a set missingness percentage, which corresponds to a missing completely at random (MCAR) mechanism [Little and Rubin, 2014]. We repeat all experiments for 10 (resp. 100) runs on MNIST, Fashion, and Cov Type (resp. Adult and Splice) and report the average. (While some setup details are given, concrete hyperparameters for the logistic regression training itself are not provided in the main text.)