Gaussian Process Classification and Active Learning with Multiple Annotators
Authors: Filipe Rodrigues, Francisco Pereira, Bernardete Ribeiro
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our model significantly outperforms other commonly used approaches, such as majority voting, without a significant increase in the computational cost of approximate Bayesian inference. Furthermore, an active learning methodology is proposed, which is able to reduce annotation cost even further. The proposed approaches6 are validated using both real and simulated annotators on real datasets from different application domains. |
| Researcher Affiliation | Collaboration | Filipe Rodrigues FMPR@DEI.UC.PT Centre for Informatics and Systems of the University of Coimbra (CISUC), 3030-290 Coimbra, PORTUGAL Francisco C. Pereira CAMARA@SMART.MIT.EDU Singapore-MIT Alliance for Research and Technology (SMART) 47 1 CREATE Way, SINGAPORE Bernardete Ribeiro BRIBEIRO@DEI.UC.PT Centre for Informatics and Systems of the University of Coimbra (CISUC), 3030-290 Coimbra, PORTUGAL |
| Pseudocode | No | The paper describes the four steps of the Expectation Propagation (EP) algorithm in detail but presents them as narrative text rather than a structured pseudocode block or algorithm listing. |
| Open Source Code | Yes | Source code and datasets are available at: http://amilab.dei.uc.pt/fmpr/software/ |
| Open Datasets | Yes | This annotator simulation process is applied to various datasets from the UCI repository7, and the results of the proposed approach (henceforward referred to as GPC-MA) is compared with two baselines: one consisting of using the majority vote for each instance (referred as GPC-MV), and another baseline consisting of using all data points from all annotators as training data (GPC-CONC). ... The proposed approach was also evaluated on real multiple-annotator settings by applying it to the datasets used in (Rodrigues et al., 2013a) and made available online by the authors. |
| Dataset Splits | Yes | For all experiments, a random 70/30% train/test split was performed and a isotropic squared exponential covariance function was used. |
| Hardware Specification | Yes | Table 2 shows the average execution times over 30 runs on a Intel Core i7 2600 (3.4GHZ) machine with 32GB DDR3 (1600MHZ) of memory. |
| Software Dependencies | No | The paper mentions techniques and models like Gaussian process classification, Expectation Propagation, and logistic regression, but does not provide specific software names with version numbers for implementation libraries or frameworks (e.g., Python version, specific machine learning library versions). |
| Experiment Setup | Yes | For all experiments, a random 70/30% train/test split was performed and a isotropic squared exponential covariance function was used. ... these values were set to n = 3 and ϵ = 10 4. ... with the music genre dataset a squared exponential covariance function with Automatic Relevance Determination (ARD) was used, and the hyper-parameters were optimized by maximizing the marginal likelihood. ... For each genre, we randomly initialize the algorithm with 200 instances and then perform active learning for another 300 instances. In order to make active learning more efficient, in each iteration we rank the unlabeled instances according to eq. 11 and select the top 10 instances to label. |