Random Mixed Field Model for Mixed-Attribute Data Restoration

Authors: Qiang Li, Wei Bian, Richard Xu, Jane You, Dacheng Tao

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method by classification experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classification accuracy of noisy and incomplete data by comparing with other data restoration methods.
Researcher Affiliation Academia Centre for Quantum Computation and Intelligent Systems, FEIT, University of Technology Sydney School of Computing and Communications, FEIT, University of Technology Sydney Department of Computing, The Hong Kong Polytechnic University
Pseudocode No The paper describes algorithms and derivations mathematically but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about open-source code availability or links to a code repository for the described methodology.
Open Datasets Yes We design a simulation study to show that mixed-attribute correlations can effectively help reduce noise effects and improve classification performance. Consider a mixed-net graph consisting of 15 continuous and 10 discrete nodes... we generate 750 random examples for each class. Then we split all the examples into training and testing sets with a ratio of 2 : 1. The training set is utilized to train RMF model and KNN, SVM classifiers. And the testing set is used to generate noisy testing sets... In this section, we present experimental results on four real-world mixed-attribute datasets from the UCI machine learning repository (Lichman 2013), which are Adult, Credit, Statlog-Australian and Statlog-German. The Adult dataset has already been split into train/test in approximately 2/3, 1/3 proportions. As for the Credit, Statlog Australian and Statlog-German datasets, we simply select the first 2/3 proportion of all the instances as the training set and the remaining as the testing set.
Dataset Splits No The paper discusses training and testing splits, but does not explicitly mention a distinct validation set split for hyperparameter tuning or early stopping. The term "validation" is used in the context of the variational approach, not a data split.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing instances used for running the experiments.
Software Dependencies No The paper mentions "Matlab’s Bioinformatics toolbox" and "knnimpute.m" but does not provide version numbers for any software dependencies. It also cites various algorithms and models without specifying their software implementations or versions.
Experiment Setup Yes For continuous attributes, the noise standard deviations are σs = τ ˇσs, s = 1, 2, . . . , m, with ˇσs being the signal standard deviations. For discrete attributes, the flipping probabilities are formulated as p(yj = a|vj = a) = τ, j = 1, 2, . . . , n... five different levels of noise strength (τ = 0.1, 0.2, 0.3, 0.4, 0.5) are applied to all the four UCI datasets... Missing completely at random (MCAR) strategy is employed to randomly annihilate a percentage (ρ = 0.1, 0.2, 0.3, 0.4, 0.5) of continuous and discrete attributes of each instance in the testing data... For KNNI settings, we choose K = 3 and use weighted Euclidean distance measure... The REMI source code is available at the author’s homepage http://www.clidyn.ethz.ch/imputation/, and the default setting is used.