A Global Geometric Analysis of Maximal Coding Rate Reduction
Authors: Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor 2Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 3Department of Electrical Engineering and Computer Science, University of California, Berkeley 4Department of Computer Science and Engineering, The Ohio State University, Columbus 5Institute of Data Science, University of Hong Kong. |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states 'All codes are implemented in Python mainly using Num Py and Py Torch' but does not provide a specific link or explicit statement about the release of its source code. |
| Open Datasets | Yes | In this subsection, we conduct numerical experiments on the image datasets MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009) to provide evidence that our theory also applies to deep networks. |
| Dataset Splits | No | The paper describes training setups and parameters but does not explicitly mention or specify a validation dataset split (e.g., percentages or counts for a validation set). |
| Hardware Specification | Yes | All of our experiments are executed on a computing server equipped with NVIDIA A40 GPUs. |
| Software Dependencies | No | The paper states 'All codes are implemented in Python mainly using Num Py and Py Torch' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | In this experiment, we set the parameters in Problem (5) as follows: the dimension of features d = 100, the number of classes K = 4, the number of samples in each class is m1 = 30, m2 = 70, m3 = 40, m4 = 60, the regularization parameter λ = 0.1, and the quantization error ϵ = 0.5. ... We fix the learning rate of GD as 10^-1 in the training. We terminate the algorithm when the gradient norm at some iterate is less 10^-5. ... For the Adam settings, we use a momentum of 0.9, a full-batch size, and a dynamically adaptive learning rate initialized with 5e-3, modulated by a Cosine Annealing learning rate scheduler (Loshchilov & Hutter, 2016). We terminate the algorithm when it reaches 3000 epochs. |