reproducibilityindex.ai

Random Mixed Field Model for Mixed-Attribute Data Restoration

Authors: Qiang Li, Wei Bian, Richard Xu, Jane You, Dacheng Tao

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method by classiﬁcation experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classiﬁcation accuracy of noisy and incomplete data by comparing with other data restoration methods.
Researcher Affiliation	Academia	Centre for Quantum Computation and Intelligent Systems, FEIT, University of Technology Sydney School of Computing and Communications, FEIT, University of Technology Sydney Department of Computing, The Hong Kong Polytechnic University
Pseudocode	No	The paper describes algorithms and derivations mathematically but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about open-source code availability or links to a code repository for the described methodology.
Open Datasets	Yes	We design a simulation study to show that mixed-attribute correlations can effectively help reduce noise effects and improve classiﬁcation performance. Consider a mixed-net graph consisting of 15 continuous and 10 discrete nodes... we generate 750 random examples for each class. Then we split all the examples into training and testing sets with a ratio of 2 : 1. The training set is utilized to train RMF model and KNN, SVM classiﬁers. And the testing set is used to generate noisy testing sets... In this section, we present experimental results on four real-world mixed-attribute datasets from the UCI machine learning repository (Lichman 2013), which are Adult, Credit, Statlog-Australian and Statlog-German. The Adult dataset has already been split into train/test in approximately 2/3, 1/3 proportions. As for the Credit, Statlog Australian and Statlog-German datasets, we simply select the ﬁrst 2/3 proportion of all the instances as the training set and the remaining as the testing set.
Dataset Splits	No	The paper discusses training and testing splits, but does not explicitly mention a distinct validation set split for hyperparameter tuning or early stopping. The term "validation" is used in the context of the variational approach, not a data split.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing instances used for running the experiments.
Software Dependencies	No	The paper mentions "Matlab’s Bioinformatics toolbox" and "knnimpute.m" but does not provide version numbers for any software dependencies. It also cites various algorithms and models without specifying their software implementations or versions.
Experiment Setup	Yes	For continuous attributes, the noise standard deviations are σs = τ ˇσs, s = 1, 2, . . . , m, with ˇσs being the signal standard deviations. For discrete attributes, the ﬂipping probabilities are formulated as p(yj = a\|vj = a) = τ, j = 1, 2, . . . , n... ﬁve different levels of noise strength (τ = 0.1, 0.2, 0.3, 0.4, 0.5) are applied to all the four UCI datasets... Missing completely at random (MCAR) strategy is employed to randomly annihilate a percentage (ρ = 0.1, 0.2, 0.3, 0.4, 0.5) of continuous and discrete attributes of each instance in the testing data... For KNNI settings, we choose K = 3 and use weighted Euclidean distance measure... The REMI source code is available at the author’s homepage http://www.clidyn.ethz.ch/imputation/, and the default setting is used.