The Diversified Ensemble Neural Network

Authors: Shaofeng Zhang, Meng Liu, Junchi Yan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on public tabular datasets, images, and texts. By adopting weight sharing approach, the results show our method can notably improve the accuracy and stability of the original neural networks with ignorable extra time and space overhead.
Researcher Affiliation Academia Shaofeng Zhang1, Meng Liu1, Junchi Yan2 1 University of Electronic Science and Technology of China 2 Department of CSE, and Mo E Key Lab of Artificial Intelligence, AI Institute Shanghai Jiao Tong University {sfzhang,2017221004027}@std.uestc.edu.cn, yanjunchi@sjtu.edu.cn
Pseudocode Yes Algorithm 1 Diversified Ensemble Layer for Regression Network Training and Prediction
Open Source Code No The paper does not include an unambiguous statement of releasing source code or a direct link to a code repository for the methodology described.
Open Datasets Yes The first dataset is the Higgs boson [27] dataset from high energy physics. ... The image datasets are CIFAR-10 and CIFAR-100... For text datasets, The 20NG [30] contains 18,846 documents... Ohsumed corpus [31] contains 7,400 documents... R52 and R8 [32] are two subsets of the Reuters 21578 dataset... MR [33] corpus has 5,331 positive and 5,331 negative reviews. SST [34] contains five categories...
Dataset Splits Yes R8 has eight categories and is split to 5,485 training and 2,189 test documents. R52 has 52 categories and is split to 6,532 training and 2,568 test documents. ...To compare XGBoost and Light GBM, we randomly select 10M instances as training set and use the rest as testing set.
Hardware Specification Yes All the experiments are implemented using Py Torch on a single NVIDIA 1080Ti GPU.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version information.
Experiment Setup Yes For the hyper-parameter initialization, γi are initialized to 1/N and the hyper-parameter α are initialized to 1.And the training procedure of the Ensemble layer consists of two training stages. Firstly, we optimize W by each single loss Ls in Eq. 2 and Ld in Eq. 3. Then, we fix W and optimize by Lt by Eq. 5. The number of individual models is set N = 4 universally.