Defining Neural Network Architecture through Polytope Structures of Datasets

Authors: Sangmin Lee, Abbas Mammadov, Jong Chul Ye

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 3, we studied the relationship between the dataset geometry and neural network architectures. In this section, we provide two empirical results: 1) gradient descent indeed converges to the networks we unveil, and 2) we can investigate the geometric features of high-dimensional real-world datasets through our proposed algorithm. [...] Our empirical results are presented in Table 1. Each column in the table corresponds to a class in the dataset, where each row presents the type of the class. The values in the table denotes the number of polytopes and their faces (we use notation a+b to denote two polytopes with a and b faces, respectively).
Researcher Affiliation Academia 1Department of Mathematical Science, KAIST, Daejeon, Korea 2School of Computing, KAIST, Daejeon, Korea 3Kim Jaechul Graduate School of AI, KAIST, Daejeon, Korea.
Pseudocode Yes Algorithm 1 Compressing algorithm; Algorithm 2 Extracting a polytope-basis cover from a three-layer Re LU network; Algorithm 3 Extracting a polytope-basis cover from a trained two-layer Re LU network; Algorithm 4 An efficient algorithm for finding a polytope-basis cover.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets Yes We conclude this section by providing theoretical insights into the convergence behavior of gradient descent. In Appendix D, utilizing our explicit construction of neural networks, we construct an explicit path that loss strictly decreases to zero (the global minima), when the network is initialized close to the target polytope (see Theorem D.3). The specific conditions governing the initialization region are described in terms of the distribution of the dataset along the convex polytope.
Dataset Splits No The paper mentions using 'training and test sets' for evaluation but does not specify any validation split percentages, absolute counts, or methods for creating a validation set.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions 'PyTorch' in Appendix C.1 but does not provide a specific version number or other software dependencies with their versions.
Experiment Setup Yes We evaluate the performance for two loss functions, which are the mean squared error (MSE) loss and the binary cross entropy (BCE) loss functions. For the BCE loss, we applied SIG on the last layer. [...] We utilize Algorithm 1 to identify a single polytope with minimal width, achieving an accuracy greater than 99.9% on the noised dataset. [...] Here we use λbias = 5, and λ = (1, 10).