Stacking Deep Set Networks and Pooling by Quantiles
Authors: Zhuojun Chen, Xinghua Zhu, Dongzhe Su, Justin C. I. Chuang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach in a variety of tasks, showing that quantile pooling can outperform both max and average pooling in each of their respective strengths. ... Experiments are conducted to reveal the capability of different pooling methods in a variety of tasks. |
| Researcher Affiliation | Academia | Zhuojun Chen 1 Xinghua Zhu 1 Dongzhe Su 1 Justin C. I. Chuang 1 ASTRI, Hong Kong, China. Correspondence to: Zhuojun Chen <georgechen@astri.org>. |
| Pseudocode | Yes | Algorithm 1 Maximum Envelopes |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | We explore quantile pooling on Model Net40 (Wu et al., 2015) a benchmark dataset with 40 classes of 3D objects. |
| Dataset Splits | No | The paper describes training configurations (batch size, learning rate, number of steps) and network architecture details, but does not explicitly state the train/validation/test dataset splits (e.g., percentages or sample counts) for any of the tasks. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizers (AdamW, Adam) and activation functions (ReLU), but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that were used for implementation. |
| Experiment Setup | Yes | Detailed experimental settings are provided in Appendix A.11. ... Number of residual layers is set to 3, with a hidden dimension of 64. We use a batch size of 1000 ... a learning rate of 0.0001 and weight decay of 0.1. Learning rate is decayed by a factor of 0.5 every 600 steps. ... For the experiments in this paper, unless otherwise specified, we fix q at 0.95 and ϵ at 0.05... For Θ(l) g , we use a three-layer MLP with the same hidden dimension D, while for Θ(l), we employ another three-layer MLP with a hidden dimension of D/2 in the middle layer. ... All MLPs come with batch normalization and Re LU activation following each linear layer. |