Polynomial Width is Sufficient for Set Representation with High-dimensional Features
Authors: Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Appendix A, we run numerical experiments to verify our argument. Fig. 2 demonstrates the polynomial dependence between the set size, feature dimension, and the minimal latent embedding dimension to achieve a small approximation error. See more details in Appendix A. ... To verify our theoretical claim, we conducted proof-of-concept experiments. Similar to Wagstaff et al. (2019), we train a Deep Sets with ϕ and ρ parameterized by neural networks to fit a function that takes the median over a vector-valued set according to the lexicographical order. Specifically, the input features are sampled from a uniform distribution, ϕ is chosen as one linear layer followed by a Si LU activation function (Elfwing et al., 2018), and ρ is a two-layer fully-connected network with Re LU activation. During the experiment, we vary the input size, dimension, and hidden dimension of ϕ, and record the final training error (RMSE) after the network converges. The critical width L is taken at the point where RMSE first reaches below 10% above the minimum value for this set size. The relationship between L and N, D is plotted in Fig. 2. We observe log(L ) grows linearly with log(N) and log(D) instead of exponentially, which validates our theoretical claim. |
| Researcher Affiliation | Academia | 1University of Texas at Austin, 2Georgia Tech, 3University of Waterloo, 4Purdue University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | No | The text mentions that 'the input features are sampled from a uniform distribution' but does not provide access information (link, DOI, repository, or citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions 'training error (RMSE)' and a 'critical width' but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions activation functions but does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper describes some architectural choices for ϕ and ρ, but it lacks specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer configurations. |