Training Decision Trees as Replacement for Convolution Layers

Authors: Wolfgang Fuhl, Gjergji Kasneci, Wolfgang Rosenstiel, Enkeljda Kasneci3882-3889

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results on multiple publicly available data sets show that our approach performs similar to conventional neuronal networks.
Researcher Affiliation Academia Wolfgang Fuhl, Gjergji Kasneci, Wolfgang Rosenstiel, Enkeljda Kasneci Eberhard Karls Universit at T ubingen Sand 14 T ubingen, Germany {wolfgang.fuhl, gjergji.kasneci, wolfgang.rosenstiel, enkelejda.kasneci}@uni-tuebingen.de
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No An implementation for TensorFlow (Abadi et al. 2016) and Torch (Collobert, Bengio, and Mari ethoz 2002) is also planned since those are currently the most popular frameworks.
Open Datasets Yes The Le Net-5 model was used in the comparison on the MNIST (Le Cun et al. 1998) dataset... The Res Net-34 was used for the comparison on the CIFAR10 (Krizhevsky and Hinton 2009) dataset... we used landmark regression. Therefore, we compared the decision trees with convolutions on the 300W (Zhu and Ramanan 2012) dataset...
Dataset Splits No The paper explicitly mentions training and test set sizes for MNIST (60,000 training, 10,000 test), CIFAR10 (50,000 training, 10,000 test), and 300W (3,148 training, 689 test), but does not explicitly state a separate validation set split.
Hardware Specification Yes For Le Net-5 we used a desktop PC with an Intel i5-4570 CPU (3.2 GHz), 16 GB DDR4 RAM, NVIDIA GTX 1050Ti GPU with 4GB RAM and Windows 7 64 bit operating system. The second hardware setup was used for the Res Net models since those require more GPU RAM. Therefore, we used a server with an Intel i9-9900K CPU (3.6 GHz), 64 GB DDR4 RAM, two RTX 2080ti GPUs with 11.2GB RAM each and an Windows 8.1 64 bit operating system.
Software Dependencies No We implemented the decision tree layer in C++ on the CPU and in CUDA on the GPU. The implementation was integrated into the DLIB (King 2009) framework which uses CUDNN functions.
Experiment Setup Yes Training parameters for MNIST: We used the Adam optimizer (Kingma and Ba 2014) with the first momentum set to 0.9 and the second momentum set to 0.999. Weight decay was set to 5 10 4 for the convolutions and to 10 8 for the decision trees. The batch size was set to 400 and each batch was always balanced in terms of available classes. ... The initial learning rate was set to 10 2 and reduced by 10 1 after each 100 epochs until it reached 10 4. For the learning rate of 10 4 we continued the training for additional 1000 epochs and selected the best result. For data augmentation we used random noise in the range of 0-30% of the image resolution.