reproducibilityindex.ai

Defining Neural Network Architecture through Polytope Structures of Datasets

Authors: Sangmin Lee, Abbas Mammadov, Jong Chul Ye

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 3, we studied the relationship between the dataset geometry and neural network architectures. In this section, we provide two empirical results: 1) gradient descent indeed converges to the networks we unveil, and 2) we can investigate the geometric features of high-dimensional real-world datasets through our proposed algorithm. [...] Our empirical results are presented in Table 1. Each column in the table corresponds to a class in the dataset, where each row presents the type of the class. The values in the table denotes the number of polytopes and their faces (we use notation a+b to denote two polytopes with a and b faces, respectively).
Researcher Affiliation	Academia	1Department of Mathematical Science, KAIST, Daejeon, Korea 2School of Computing, KAIST, Daejeon, Korea 3Kim Jaechul Graduate School of AI, KAIST, Daejeon, Korea.
Pseudocode	Yes	Algorithm 1 Compressing algorithm; Algorithm 2 Extracting a polytope-basis cover from a three-layer Re LU network; Algorithm 3 Extracting a polytope-basis cover from a trained two-layer Re LU network; Algorithm 4 An efficient algorithm for finding a polytope-basis cover.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	Yes	We conclude this section by providing theoretical insights into the convergence behavior of gradient descent. In Appendix D, utilizing our explicit construction of neural networks, we construct an explicit path that loss strictly decreases to zero (the global minima), when the network is initialized close to the target polytope (see Theorem D.3). The specific conditions governing the initialization region are described in terms of the distribution of the dataset along the convex polytope.
Dataset Splits	No	The paper mentions using 'training and test sets' for evaluation but does not specify any validation split percentages, absolute counts, or methods for creating a validation set.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'PyTorch' in Appendix C.1 but does not provide a specific version number or other software dependencies with their versions.
Experiment Setup	Yes	We evaluate the performance for two loss functions, which are the mean squared error (MSE) loss and the binary cross entropy (BCE) loss functions. For the BCE loss, we applied SIG on the last layer. [...] We utilize Algorithm 1 to identify a single polytope with minimal width, achieving an accuracy greater than 99.9% on the noised dataset. [...] Here we use λbias = 5, and λ = (1, 10).