reproducibilityindex.ai

Learning with Previously Unseen Features

Authors: Yuan Shi, Craig A. Knoblock

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an efﬁcient optimization algorithm for learning the model parameters and empirically evaluate the approach on several regression and classiﬁcation tasks. Experimental results show that our approach can achieve on average 11.2% improvement over baselines.
Researcher Affiliation	Academia	Yuan Shi Computer Science Department University of Southern California yuanshi@usc.edu Craig A. Knoblock Information Sciences Institute University of Southern California knoblock@isi.edu
Pseudocode	Yes	Algorithm 1 Optimization algorithm for LUF
Open Source Code	Yes	Our algorithms and datasets can be accessed from https://github.com/yuanshi/Unseen Features
Open Datasets	Yes	We experiment with four regression datasets Abalone,5 for predicting the age of abalone, contains 4,177 samples with 8 features. ... Bank,6 which predicts the fraction of bank customers that are turned away due to queuing, contains 8,192 samples with 8 features. ... CPU,7 for CPU running time prediction, contains 8,192 samples with 12 features. ... House,8 for housing price prediction, contains 20,640 samples with 9 features. ... We experiment with three classiﬁcation datasets USPS, which recognizes handwriting digits from images, contains 9,298 samples from 10 classes [Hull, 1994]. Books, which performs sentiment analysis on book reviews from Amazon, contains 4,000 samples from 2 classes [Blitzer et al., 2006]. Webcam, which recognizes objects in low-resolution images taken by web cameras, contains 795 samples from 10 classes [Kulis et al., 2011]. ... We conduct experiments with data from Weather Underground,11 which contains sensor data from a large number of personal weather stations worldwide. ... 11http://www.wunderground.com
Dataset Splits	Yes	We apply ten-fold cross validation on the target domain and report the average error. ... To tune the weight γ and regularization parameter λ, we apply a leave-one-out cross validation strategy on the source domain to simulate our problem setting. ... In each trial, we randomly split the dataset into the source/target domain, each with half the number of samples.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions types of models like "kernel regression" and "logistic regression" but does not specify any software libraries or their version numbers used (e.g., Python, scikit-learn, PyTorch versions).
Experiment Setup	Yes	The hyper-parameters of the above methods, including c and d in polynomial kernel, k in k-NN regression, the regularization parameter λ, are tuned on the source domain. ... For all datasets, we scale each feature to [0,1], and then use principal component analysis (PCA) to reduce the dimensionality to 100, which reduces computational cost and feature noise. ... To prevent overﬁtting, we adopt an early stopping strategy: train a model on {(xt, ˆyt)} and apply it to source-domain data. If the prediction error on the source domain is larger than a certain threshold, we stop the learning process. We also terminate the optimization when the objective function decreases very slowly, which not only saves computational time but also reduces overﬁtting.