reproducibilityindex.ai

DyS: A Framework for Mixture Models in Quantification

Authors: André Maletzke, Denis dos Reis, Everton Cherman, Gustavo Batista4552-4560

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we generalize MM with a base framework called Dy S: Distribution y-Similarity. With this framework, we perform a thorough evaluation of the most critical design decisions of MM models. For instance, we assess 15 dissimilarity functions to compare histograms with varying numbers of bins from 2 to 110 and, for the ﬁrst time, make a connection between quantiﬁcation accuracy and test sample size, with experiments covering 24 public benchmark datasets.
Researcher Affiliation	Academia	Andr e Maletzke, Denis dos Reis, Everton Cherman, Gustavo Batista Instituto de Ciˆencias Matem aticas e de Computac ao, Universidade de S ao Paulo {andregustavo,denismr,evertoncherman}@usp.br, gbatista@icmc.usp.br
Pseudocode	Yes	Algorithm 1: Ordinal Distance Algorithm 2: SORD Dissimilarity Function
Open Source Code	No	The paper mentions a "supplemental material website" (https://sites.google.com/site/andregustavom/research/dys) which contains figures. However, it does not contain an unambiguous statement that the source code for the methodology described in the paper is openly available, nor is the link directly to a code repository.
Open Datasets	Yes	Table 2: Datasets description. Anuran Calls...UCI Bank Marketing...UCI ...from UCI (Dheeru and Karra Taniskidou 2017), Open ML (Vanschoren et al. 2013), PROMISE (Sayyad Shirabad and Menzies 2005), and Reis (dos Reis et al. 2018a) repositories. Speciﬁc citations are requested for Bank Marketing (Moro, Cortez, and Rita 2014), Credit Card (Yeh and Lien 2009), HTRU2 (Lyon et al. 2016), Mozilla4 (Koru, Zhang, and Liu 2007), Mushroom (Lincoff 1989), Nomao (Candillier and Lemaire 2012), and Occupancy Detection (Candanedo and Feldheim 2016).
Dataset Splits	Yes	Each dataset was uniformly split into two halves: training and test. With the training half, we performed 10-fold cross-validation to obtain the training scores used by Dy S.
Hardware Specification	No	The paper describes the experimental setup, including datasets, software components (Random Forests), and evaluation metrics. However, it does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper states: "We produced all scores using Random Forests with 200 trees." While it mentions Random Forests, it does not provide specific version numbers for this software or any other ancillary libraries/packages used in the experiments.
Experiment Setup	Yes	To verify the impact of the number of the bins, in all experiments, we vary the number of bins from 2 to 20 with increments of 2, and from 20 to 110 with increments of 10. The test sample size, on the other hand, varies from 10 to 100 with increments of 10 examples, and from 100 to 500 with increments of 100 examples. We performed preliminary experiments and concluded that Ternary Search (TSearch) suits all tested dissimilarity functions. For this reason, it is used for all of our experiments. We produced all scores using Random Forests with 200 trees.