Active Learning for Informative Projection Retrieval

Authors: Madalina Fiterau, Artur Dubrawski

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The set of synthetic data used in our experiments has 10 features and contains q = 3 batches of data points... The curves in Figure 1 are averaged over twenty executions of the algorithm. We have also used Active RIPR to filter out alerts coming from a cardio-respiratory monitoring system... We extracted a total of 50 features, 800 labeled samples and roughly 8000 unlabeled ones. We use Active RIPR to classify oxygen saturation alerts, treating the existing labeled data as the pool of samples available for active learning. We performed 10-fold cross validation, training the Active RIPR model on 90% of the samples and using the remainder to calculate the learning curve shown in Figure 1 (right). Info Gain once again outperforms the rest, with accuracy of 0.88 achievable by labeling less than 25% of the total samples.
Researcher Affiliation Academia Madalina Fiterau mfiterau@cs.cmu.edu Carnegie Mellon University 5000 Forbes Ave, Pittsburgh PA 15213 Artur Dubrawski awd@cs.cmu.edu Carnegie Mellon University 5000 Forbes Ave, Pittsburgh PA 15213
Pseudocode No The paper describes mathematical formulas and algorithmic concepts (e.g., scoring functions) but does not contain structured pseudocode or algorithm blocks clearly labeled or formatted as such.
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the source code of the methodology described in this paper.
Open Datasets No The paper mentions using 'synthetic data' and data from a 'cardio-respiratory monitoring system', extracting '50 features, 800 labeled samples and roughly 8000 unlabeled ones'. However, it does not provide any specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets to access this data publicly.
Dataset Splits Yes We performed 10-fold cross validation, training the Active RIPR model on 90% of the samples and using the remainder to calculate the learning curve shown in Figure 1 (right). The table presents the mean leave-one-out accuracy of after 20, 50 and 75 labels.
Hardware Specification No The paper does not provide any specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper describes the dataset characteristics (e.g., '10 features', '50 features, 800 labeled samples') and the evaluation methodology (e.g., 'averaged over twenty executions', '10-fold cross validation', 'leave-one-out accuracy'), but does not provide specific experimental setup details such as concrete hyperparameter values, model initialization, or detailed system-level training configurations for the Active RIPR model.