AutoEncoder by Forest

Authors: Ji Feng, Zhi-Hua Zhou

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that, compared with DNN based auto-encoders, e Forest is able to obtain lower reconstruction error with fast training speed, while the model itself is reusable and damage-tolerable. ... Experiments Image Reconstruction We evaluate the performance of e Forest in both supervised and unsupervised setting.
Researcher Affiliation Academia Ji Feng, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210023, China {fengj, zhouzh}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1: Forward Encoding ... Algorithm 2: Calculate MCR ... Algorithm 3: Backward Decoding
Open Source Code No The paper refers to existing implementations and documentation (e.g., Keras) for comparison models but does not provide any link or explicit statement about releasing the source code for their proposed e Forest model.
Open Datasets Yes We use the MNIST dataset (Le Cun et al. 1998), which consists of 60,000 gray scale 28 28 images for training and 10,000 for testing. We also use CIFAR-10 dataset (Krizhevsky 2009), which is a more complex dataset consists of 50,000 colored 32 32 images for training and 10,000 colored images for testing. ... Concretely, we used the IMDB dataset (Maas et al. 2011) which contains 25,000 documents for training and 25,000 documents for testing.
Dataset Splits No The paper specifies training and testing set sizes for MNIST, CIFAR-10, and IMDB datasets, but does not explicitly mention or quantify a separate validation set split.
Hardware Specification Yes We implement e Forest on a single Intel KNL-7250 (3 TFLOPS peak), and achieved a 67.7 speedup for training 1,000 trees in an unsupervised setting, compared with a serial implementation. For a comparison, we trained the corresponding MLPs and CNN-AEs and SWWAEs with the same configurations as in the previous sections on one Titan-X GPU (6 TFLOPS peak)
Software Dependencies No The paper mentions Keras documentation for the CNN-AE implementation but does not provide specific version numbers for Keras, Python, or any other software libraries used.
Experiment Setup Yes Concretely, for supervised e Forest, each non-terminal node randomly select d attributes in the input space and pick the best possible split for information gain; for un-supervised e Forest, each non-terminal node randomly pick one attributes and make a random split. In our experiments we simply grow the trees to pure leaf, or terminate when there are only two instances in a node. We evaluate e Forest containing 500 trees or 1,000 trees... For a vanilla CNN-AE... Re LUs are used for activation and logloss is used as training objective. During training, dropout is set to be 0.25 per layer.