A Global Geometric Analysis of Maximal Coding Rate Reduction

Authors: Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets.
Researcher Affiliation Academia 1Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor 2Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 3Department of Electrical Engineering and Computer Science, University of California, Berkeley 4Department of Computer Science and Engineering, The Ohio State University, Columbus 5Institute of Data Science, University of Hong Kong.
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states 'All codes are implemented in Python mainly using Num Py and Py Torch' but does not provide a specific link or explicit statement about the release of its source code.
Open Datasets Yes In this subsection, we conduct numerical experiments on the image datasets MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009) to provide evidence that our theory also applies to deep networks.
Dataset Splits No The paper describes training setups and parameters but does not explicitly mention or specify a validation dataset split (e.g., percentages or counts for a validation set).
Hardware Specification Yes All of our experiments are executed on a computing server equipped with NVIDIA A40 GPUs.
Software Dependencies No The paper states 'All codes are implemented in Python mainly using Num Py and Py Torch' but does not provide specific version numbers for these software components.
Experiment Setup Yes In this experiment, we set the parameters in Problem (5) as follows: the dimension of features d = 100, the number of classes K = 4, the number of samples in each class is m1 = 30, m2 = 70, m3 = 40, m4 = 60, the regularization parameter λ = 0.1, and the quantization error ϵ = 0.5. ... We fix the learning rate of GD as 10^-1 in the training. We terminate the algorithm when the gradient norm at some iterate is less 10^-5. ... For the Adam settings, we use a momentum of 0.9, a full-batch size, and a dynamically adaptive learning rate initialized with 5e-3, modulated by a Cosine Annealing learning rate scheduler (Loshchilov & Hutter, 2016). We terminate the algorithm when it reaches 3000 epochs.