reproducibilityindex.ai

Gaussian Process Classification and Active Learning with Multiple Annotators

Authors: Filipe Rodrigues, Francisco Pereira, Bernardete Ribeiro

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that our model signiﬁcantly outperforms other commonly used approaches, such as majority voting, without a signiﬁcant increase in the computational cost of approximate Bayesian inference. Furthermore, an active learning methodology is proposed, which is able to reduce annotation cost even further. The proposed approaches6 are validated using both real and simulated annotators on real datasets from different application domains.
Researcher Affiliation	Collaboration	Filipe Rodrigues FMPR@DEI.UC.PT Centre for Informatics and Systems of the University of Coimbra (CISUC), 3030-290 Coimbra, PORTUGAL Francisco C. Pereira CAMARA@SMART.MIT.EDU Singapore-MIT Alliance for Research and Technology (SMART) 47 1 CREATE Way, SINGAPORE Bernardete Ribeiro BRIBEIRO@DEI.UC.PT Centre for Informatics and Systems of the University of Coimbra (CISUC), 3030-290 Coimbra, PORTUGAL
Pseudocode	No	The paper describes the four steps of the Expectation Propagation (EP) algorithm in detail but presents them as narrative text rather than a structured pseudocode block or algorithm listing.
Open Source Code	Yes	Source code and datasets are available at: http://amilab.dei.uc.pt/fmpr/software/
Open Datasets	Yes	This annotator simulation process is applied to various datasets from the UCI repository7, and the results of the proposed approach (henceforward referred to as GPC-MA) is compared with two baselines: one consisting of using the majority vote for each instance (referred as GPC-MV), and another baseline consisting of using all data points from all annotators as training data (GPC-CONC). ... The proposed approach was also evaluated on real multiple-annotator settings by applying it to the datasets used in (Rodrigues et al., 2013a) and made available online by the authors.
Dataset Splits	Yes	For all experiments, a random 70/30% train/test split was performed and a isotropic squared exponential covariance function was used.
Hardware Specification	Yes	Table 2 shows the average execution times over 30 runs on a Intel Core i7 2600 (3.4GHZ) machine with 32GB DDR3 (1600MHZ) of memory.
Software Dependencies	No	The paper mentions techniques and models like Gaussian process classiﬁcation, Expectation Propagation, and logistic regression, but does not provide specific software names with version numbers for implementation libraries or frameworks (e.g., Python version, specific machine learning library versions).
Experiment Setup	Yes	For all experiments, a random 70/30% train/test split was performed and a isotropic squared exponential covariance function was used. ... these values were set to n = 3 and ϵ = 10 4. ... with the music genre dataset a squared exponential covariance function with Automatic Relevance Determination (ARD) was used, and the hyper-parameters were optimized by maximizing the marginal likelihood. ... For each genre, we randomly initialize the algorithm with 200 instances and then perform active learning for another 300 instances. In order to make active learning more efﬁcient, in each iteration we rank the unlabeled instances according to eq. 11 and select the top 10 instances to label.