DyS: A Framework for Mixture Models in Quantification
Authors: André Maletzke, Denis dos Reis, Everton Cherman, Gustavo Batista4552-4560
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we generalize MM with a base framework called Dy S: Distribution y-Similarity. With this framework, we perform a thorough evaluation of the most critical design decisions of MM models. For instance, we assess 15 dissimilarity functions to compare histograms with varying numbers of bins from 2 to 110 and, for the first time, make a connection between quantification accuracy and test sample size, with experiments covering 24 public benchmark datasets. |
| Researcher Affiliation | Academia | Andr e Maletzke, Denis dos Reis, Everton Cherman, Gustavo Batista Instituto de Ciˆencias Matem aticas e de Computac ao, Universidade de S ao Paulo {andregustavo,denismr,evertoncherman}@usp.br, gbatista@icmc.usp.br |
| Pseudocode | Yes | Algorithm 1: Ordinal Distance Algorithm 2: SORD Dissimilarity Function |
| Open Source Code | No | The paper mentions a "supplemental material website" (https://sites.google.com/site/andregustavom/research/dys) which contains figures. However, it does not contain an unambiguous statement that the *source code for the methodology* described in the paper is openly available, nor is the link directly to a code repository. |
| Open Datasets | Yes | Table 2: Datasets description. Anuran Calls...UCI Bank Marketing...UCI ...from UCI (Dheeru and Karra Taniskidou 2017), Open ML (Vanschoren et al. 2013), PROMISE (Sayyad Shirabad and Menzies 2005), and Reis (dos Reis et al. 2018a) repositories. Specific citations are requested for Bank Marketing (Moro, Cortez, and Rita 2014), Credit Card (Yeh and Lien 2009), HTRU2 (Lyon et al. 2016), Mozilla4 (Koru, Zhang, and Liu 2007), Mushroom (Lincoff 1989), Nomao (Candillier and Lemaire 2012), and Occupancy Detection (Candanedo and Feldheim 2016). |
| Dataset Splits | Yes | Each dataset was uniformly split into two halves: training and test. With the training half, we performed 10-fold cross-validation to obtain the training scores used by Dy S. |
| Hardware Specification | No | The paper describes the experimental setup, including datasets, software components (Random Forests), and evaluation metrics. However, it does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper states: "We produced all scores using Random Forests with 200 trees." While it mentions Random Forests, it does not provide specific version numbers for this software or any other ancillary libraries/packages used in the experiments. |
| Experiment Setup | Yes | To verify the impact of the number of the bins, in all experiments, we vary the number of bins from 2 to 20 with increments of 2, and from 20 to 110 with increments of 10. The test sample size, on the other hand, varies from 10 to 100 with increments of 10 examples, and from 100 to 500 with increments of 100 examples. We performed preliminary experiments and concluded that Ternary Search (TSearch) suits all tested dissimilarity functions. For this reason, it is used for all of our experiments. We produced all scores using Random Forests with 200 trees. |