The Diversified Ensemble Neural Network
Authors: Shaofeng Zhang, Meng Liu, Junchi Yan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on public tabular datasets, images, and texts. By adopting weight sharing approach, the results show our method can notably improve the accuracy and stability of the original neural networks with ignorable extra time and space overhead. |
| Researcher Affiliation | Academia | Shaofeng Zhang1, Meng Liu1, Junchi Yan2 1 University of Electronic Science and Technology of China 2 Department of CSE, and Mo E Key Lab of Artificial Intelligence, AI Institute Shanghai Jiao Tong University {sfzhang,2017221004027}@std.uestc.edu.cn, yanjunchi@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Diversified Ensemble Layer for Regression Network Training and Prediction |
| Open Source Code | No | The paper does not include an unambiguous statement of releasing source code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | The first dataset is the Higgs boson [27] dataset from high energy physics. ... The image datasets are CIFAR-10 and CIFAR-100... For text datasets, The 20NG [30] contains 18,846 documents... Ohsumed corpus [31] contains 7,400 documents... R52 and R8 [32] are two subsets of the Reuters 21578 dataset... MR [33] corpus has 5,331 positive and 5,331 negative reviews. SST [34] contains five categories... |
| Dataset Splits | Yes | R8 has eight categories and is split to 5,485 training and 2,189 test documents. R52 has 52 categories and is split to 6,532 training and 2,568 test documents. ...To compare XGBoost and Light GBM, we randomly select 10M instances as training set and use the rest as testing set. |
| Hardware Specification | Yes | All the experiments are implemented using Py Torch on a single NVIDIA 1080Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version information. |
| Experiment Setup | Yes | For the hyper-parameter initialization, γi are initialized to 1/N and the hyper-parameter α are initialized to 1.And the training procedure of the Ensemble layer consists of two training stages. Firstly, we optimize W by each single loss Ls in Eq. 2 and Ld in Eq. 3. Then, we fix W and optimize by Lt by Eq. 5. The number of individual models is set N = 4 universally. |