Gradients as Features for Deep Representation Learning
Authors: Fangzhou Mu, Yingyu Liang, Yin Li
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is evaluated across a number of representation-learning tasks on several datasets and using different network architectures. Strong results are obtained in all settings, and are well-aligned with our theoretical insights. Our experimental results are organized into two parts. We first perform ablation studies to understand the representation power of the gradient features. Next, we evaluate our method on three representation-learning tasks: learning deep generative models, self-supervised learning using a pretext task, and transfer learning from Image Net. |
| Researcher Affiliation | Academia | Fangzhou Mu, Yingyu Liang Department of Computer Sciences University of Wisconsin-Madison {fmu, yliang,}@cs.wisc.edu Yin Li Departments of Biostatistics & Computer Sciences University of Wisconsin-Madison yin.li@wisc.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project webpage at http://pages.cs.wisc.edu/ fmu/gradfeat20. The project webpage states: "Code will be released soon". |
| Open Datasets | Yes | We train a Bi GAN on CIFAR-10 (Krizhevsky et al., 2009)... We use the Py Torch (Paszke et al., 2017) distribution of Image Net pre-trained Res Net18 (He et al., 2016) as the base network for VOC07 (Everingham et al., 2010) object classification. SVHN CIFAR-10 CIFAR-100 VOC07 COCO2014 |
| Dataset Splits | Yes | For the SVHN and CIFAR-10/100 experiments, We train the models for 80K iterations with initial learning rate 1e-3, halved every 20K iterations. For the VOC07 and COCO2014 experiments, we train the models for 50 epochs with initial learning rate 1e-3, halved every 20 epochs. We train on the trainval split of VOC07 and the train split of COCO2014 for object classification, and report the mean average precision (m AP) scores on their respective test and val splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments. |
| Software Dependencies | No | We use the Py Torch (Paszke et al., 2017) distribution of Image Net pre-trained Res Net18... All models are trained with the Adam optimizer (Kingma & Ba, 2015). Although software names are mentioned and cited, specific version numbers for these software dependencies are not provided. |
| Experiment Setup | Yes | We train the models for 80K iterations with initial learning rate 1e-3, halved every 20K iterations. For the VOC07 and COCO2014 experiments, we train the models for 50 epochs with initial learning rate 1e-3, halved every 20 epochs. All models are trained with the Adam optimizer (Kingma & Ba, 2015) with batch size 64, β1 = 0.5, β2 = 0.999 and weight decay 1e-6. In addition to Adam, we also use the SGD optimizer with weight decay 5e-5, momentum 0.9 and the same learning rate schedule for fine-tuning, and we report the better result between the two runs. |