Minimum Description Length and Generalization Guarantees for Representation Learning
Authors: Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical simulations illustrate the advantages of well-chosen such priors over classical priors used in IB. The results shown in Fig. 2 indicate that the model trained using our priors achieves better ( 2.5%) performance in terms of both generalization error and population risk. We consider CIFAR10 [KH 09] image classification using a small CNN-based encoder and a linear decoder. |
| Researcher Affiliation | Collaboration | Milad Sefidgaran , Abdellatif Zaidi : , Piotr Krasnowski Paris Research Center, Huawei Technologies France : Universit e Gustave Eiffel, France |
| Pseudocode | No | The paper does not contain any sections explicitly labeled "Pseudocode" or "Algorithm", nor any visually structured algorithm blocks. |
| Open Source Code | Yes | The code used in the experiments is available at https://github.com/PiotrKrasnowski/MDL_and_Generalization_Guarantees_for_Representation_Learning. |
| Open Datasets | Yes | We consider CIFAR10 [KH 09] image classification using a small CNN-based encoder and a linear decoder. |
| Dataset Splits | Yes | The full dataset was split into a training set with 50,000 labeled images and a validation set with 10,000 labeled images, all of them of size 32 ˆ 32 ˆ 3. |
| Hardware Specification | Yes | Our prediction model was trained using Py Torch [PGM 19] and a GPU Tesla P100 with CUDA 11.0. |
| Software Dependencies | Yes | Our prediction model was trained using Py Torch [PGM 19] and a GPU Tesla P100 with CUDA 11.0. The Adam optimizer [KB15] (β1 0.5, β2 0.999) was used with an initial learning rate of 10 4 and an exponential decay of 0.97. |
| Experiment Setup | Yes | The Adam optimizer [KB15] (β1 0.5, β2 0.999) was used with an initial learning rate of 10 4 and an exponential decay of 0.97. The batch size was equal to 128 throughout the whole experiment. During the training phase, we jointly trained the encoder and the decoder parts for 200 epochs. |