reproducibilityindex.ai

Leveraging Common Structure to Improve Prediction across Related Datasets

Authors: Matt Barnes, Nick Gisolfi, Madalina Fiterau, Artur Dubrawski

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental Results The artiﬁcial data sets in Fig. 1 illustrate how spurious samples negatively affect the placement of a linear SVM decision boundary for a binary classiﬁcation task. We consider an oracle model trained on samples from the common distribution only (no spurious points). On the other hand, there is the baseline model, which is the result of training a linear SVM on all the data including all spurious samples. The presence of spurious samples shifts this linear decision boundary slightly, thus, the baseline divides the classes in a way which misclassiﬁes some samples from the default distribution, decreasing the accuracy compared to the oracle. We trained a model after each iteration of the greedy spurious sample removal to illustrate its effect. Then, we bootstrapped to entire process to obtain an average accuracy and a 95% conﬁdence interval, shown in Fig 1. We found that, as we removed more samples, the clipped model performance approached the oracle, with tighter conﬁdence intervals, thus the removal of spurious samples is indeed beneﬁcial. Now, let us consider a nuclear threat detection system, built for determining whether a vehicle that passes through customs emits signatures consistent with radioactive material. In Figure 2, we depict the most informative 2D projection, where a non-trivial density mismatch manifests for datasets generated with different simulation parameters. Threats are shown in red, normal samples shown in green. Figure 2 shows the blue circled spurious samples removed. The baseline we used (M0) is trained on all data. Our approach produces a clipped version of DS1 which we added to DS2 to obtain the alternative model M1. We test M0 and M1 on all other datasets. Additionally, we enhance our approach with the use of a gating function. That is, the model to be used in classiﬁcation is determined by picking the model (M0 or M1) with the smallest Renyi divergence to the test set. We refer to this gated model as M2. The justiﬁcation for this is that some testing datasets can have spurious samples that are close enough to the ones in the original datasets, so it makes sense to use these samples, when beneﬁcial. The gated version outperforms the other two as it beneﬁts from sample removal when the incoming datasets do not have spurious samples, as shown in Table 1.
Researcher Affiliation	Academia	Matt Barnes mbarnes1@cs.cmu.edu Carnegie Mellon University Pittsburgh, PA 15213 Nick Gisolﬁ ngisolﬁ@cmu.edu Carnegie Mellon University Pittsburgh, PA 15213 Madalina Fiterau mﬁterau@cs.cmu.edu Carnegie Mellon University Pittsburgh, PA 15213 Artur Dubrawski awd@cs.cmu.edu Carnegie Mellon University Pittsburgh, PA 15213
Pseudocode	No	The paper describes its procedure in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code (e.g., a specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	No	The paper mentions 'artificial data sets' and 'nuclear threat datasets DS1 and DS2' but does not provide concrete access information (e.g., specific link, DOI, repository name, or formal citation) for any publicly available or open dataset.
Dataset Splits	No	The paper mentions 'training sets' and 'test set' but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using a 'linear SVM' but does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	No	The paper does not contain specific experimental setup details such as concrete hyperparameter values, training configurations, or system-level settings in the main text.