Towards Understanding Hierarchical Learning: Benefits of Neural Representations
Authors: Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper provides theoretical results on the benefits of neural representations in deep learning. We show that using a neural network as a representation function can achieve improved sample complexity over the raw input in a neural quadratic model, and also show such a gain is not present if the model is instead linearized. We believe these results provide new understandings to hiearchical learning in deep neural networks. For future work, it would be of interest to study whether deeper representation functions are even more beneficial than shallower ones, or what happens when the representation is fine-tuned together with the trainable network. |
| Researcher Affiliation | Collaboration | Minshuo Chen Yu Bai Jason D. Lee Tuo Zhao Huan Wang Caiming Xiong Richard Socher Georgia Tech Salesforce Research Princeton University {mchen393, tourzhao}@gatech.edu jasonlee@princeton.edu {yu.bai, huan.wang, cxiong, rsocher}@salesforce.com |
| Pseudocode | Yes | Algorithm 1 Learning with Neural Representations (Quad-Neural method) Input: Labeled data Sn, unlabeled data e Sn0, initializations V 2 RD d, b 2 RD, W0 2 Rm D, parameters (λ, ). Step 1: Construct model f Q W(x) = 1 2pm 0,rh(x))(w> r h(x))2, (Quad-Neural) where h(x) = b 1/2g(x) is the neural representation (4) (using e Sn0 to estimate the covariance). Step 2: Find a second-order stationary point c W of the regularized empirical risk (on the data Sn): W) := 1 n ni=1 (f Q W(xi), yi) + λ k Wk4 2. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | We consider the standard supervised learning task, in which we receive n i.i.d. training samples Sn = {(xi, yi)}n i=1 from some data distribution D, where x 2 X is the input and y 2 Y is the label. In this paper, we assume that X = Sd 1 Rd (the unit sphere) so that inputs have unit norm kxk2 = 1. This describes a theoretical data setup and assumptions, not a specific publicly available dataset with access information. |
| Dataset Splits | No | The paper does not provide specific details about train/validation/test dataset splits, as it focuses on theoretical analysis rather than empirical experimentation with concrete datasets. |
| Hardware Specification | No | The paper is theoretical and does not describe experimental procedures that would require specific hardware. No hardware specifications were mentioned. |
| Software Dependencies | No | The paper focuses on theoretical analysis and does not describe specific software implementations. Therefore, no software dependencies with version numbers are provided. |
| Experiment Setup | No | The paper is theoretical and does not detail an experimental setup with specific hyperparameters, model initialization, or training schedules. |