reproducibilityindex.ai

A Refined Margin Distribution Analysis for Forest Representation Learning

Authors: Shen-Huan Lyu, Liang Yang, Zhi-Hua Zhou

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that md DF can effectively improve the performance on classiﬁcation tasks, especially for categorical and mixed modeling tasks. Test accuracy on benchmark datasets. Table 1 shows that md DF achieves better accuracy than the other methods on several datasets.
Researcher Affiliation	Academia	Shen-Huan Lyu, Liang Yang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing, 210023, China {lvsh,yangl,zhouzh}@lamda.nju.edu.cn
Pseudocode	Yes	Algorithm 1 Random forests block Arfb and Algorithm 2 md DF (margin distribution Deep Forest)
Open Source Code	No	The paper does not provide any statement about code availability or links to a code repository for the described methodology.
Open Datasets	Yes	PROTEIN, SENSIT, and SATIMAGE datasets are obtained from LIBSVM datasets [4]. Except for MNIST [18] dataset, others come from the UCI Machine Learning Repository [11].
Dataset Splits	Yes	From the literature, these datasets come pre-divided into training and testing sets. Therefore in our experiments, we use them in their original format. To reduce the risk of overﬁtting, the representation learned by each forest is generated by k-fold cross-validation (k = 5 in our experiments). For the multilayer perceptron (MLP) conﬁgurations... we examine a variety of architectures on the validation set, and pick the one with the best performance, then train the whole network again on the training set and report the test accuracy.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'Re LU', 'cross-entropy', and 'adadelta' but does not specify any library names with version numbers, e.g., 'TensorFlow 2.x' or 'PyTorch 1.x'.
Experiment Setup	Yes	In md DF, we take two random forests and two completely-random forests in each layer, and each forest contains 100 trees, whose maximum depth of trees in random forests grows with the layer, i.e., d(t) max {2t+2, 4t+4, 8t+8, 16t+16}. To reduce the risk of overﬁtting, the representation learned by each forest is generated by k-fold cross-validation (k = 5 in our experiments). For the multilayer perceptron (MLP) conﬁgurations, we use Re LU for the activation function, cross-entropy for the loss function, adadelta for optimization, no dropout for hidden layers... The examined architectures are listed as follows: (1) input-1024-512-output; (2) input-16-8-8-output; (3) input-70-50-output; (4) input-50-30-output; (5) input-30-20-output.