Stacking Deep Set Networks and Pooling by Quantiles

Authors: Zhuojun Chen, Xinghua Zhu, Dongzhe Su, Justin C. I. Chuang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach in a variety of tasks, showing that quantile pooling can outperform both max and average pooling in each of their respective strengths. ... Experiments are conducted to reveal the capability of different pooling methods in a variety of tasks.
Researcher Affiliation Academia Zhuojun Chen 1 Xinghua Zhu 1 Dongzhe Su 1 Justin C. I. Chuang 1 ASTRI, Hong Kong, China. Correspondence to: Zhuojun Chen <georgechen@astri.org>.
Pseudocode Yes Algorithm 1 Maximum Envelopes
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology or a direct link to a code repository.
Open Datasets Yes We explore quantile pooling on Model Net40 (Wu et al., 2015) a benchmark dataset with 40 classes of 3D objects.
Dataset Splits No The paper describes training configurations (batch size, learning rate, number of steps) and network architecture details, but does not explicitly state the train/validation/test dataset splits (e.g., percentages or sample counts) for any of the tasks.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions optimizers (AdamW, Adam) and activation functions (ReLU), but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that were used for implementation.
Experiment Setup Yes Detailed experimental settings are provided in Appendix A.11. ... Number of residual layers is set to 3, with a hidden dimension of 64. We use a batch size of 1000 ... a learning rate of 0.0001 and weight decay of 0.1. Learning rate is decayed by a factor of 0.5 every 600 steps. ... For the experiments in this paper, unless otherwise specified, we fix q at 0.95 and ϵ at 0.05... For Θ(l) g , we use a three-layer MLP with the same hidden dimension D, while for Θ(l), we employ another three-layer MLP with a hidden dimension of D/2 in the middle layer. ... All MLPs come with batch normalization and Re LU activation following each linear layer.