Representation Learning Beyond Linear Prediction Functions

Authors: Ziping Xu, Ambuj Tewari

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical results imply that simpler tasks generalize better. Though our theoretical results are shown for the global minimizer of empirical risks, their qualitative predictions still hold true for gradient-based optimization algorithms as verified by our simulations on deep neural networks. In this section, we use simulated environments to evaluate the actual performance of representation learning on DNNs trained with gradient-based optimization methods.
Researcher Affiliation Academia Ziping Xu Department of Statistics University of Michigan zipingxu@umich.edu Ambuj Tewari Department of Statistics University of Michigan tewaria@umich.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is in the supplemental material.
Open Datasets No The paper mentions 'Image Net' but does not provide concrete access information such as a link, DOI, repository, or a formal citation with authors and year for any dataset used.
Dataset Splits No The paper refers to 'nso random samples' and 'nta random samples' but does not provide specific percentages, counts, or predefined splits for training, validation, or test sets in the main text. It defers training details to supplementary materials.
Hardware Specification No The paper states that hardware details are in the supplementary materials but does not provide any specific hardware specifications (like GPU models or CPU types) in the main text.
Software Dependencies No The paper mentions using 'Adam with default parameters' but does not provide specific version numbers for Adam, Python, or any other software libraries or dependencies.
Experiment Setup Yes The first K layers are the shared representation. The source task is a multi-variate regression problem with output dimension p and Kso layers following the representation. The target task is a single-output regression problem with Kta layers following the representation. We used the same number of units for all the layers, which we denote by nu. A representation is first trained on the source task using nso random samples and is fixed for the target task, trained on nta random samples. In contrast, the baseline method trains the target task directly on the same nta samples without the pretrained network. We use Adam with default parameters for all the training. We use MSE (Mean Square Error) to evaluate the performance under different settings.