reproducibilityindex.ai

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

Authors: Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper provides theoretical results on the beneﬁts of neural representations in deep learning. We show that using a neural network as a representation function can achieve improved sample complexity over the raw input in a neural quadratic model, and also show such a gain is not present if the model is instead linearized. We believe these results provide new understandings to hiearchical learning in deep neural networks. For future work, it would be of interest to study whether deeper representation functions are even more beneﬁcial than shallower ones, or what happens when the representation is ﬁne-tuned together with the trainable network.
Researcher Affiliation	Collaboration	Minshuo Chen Yu Bai Jason D. Lee Tuo Zhao Huan Wang Caiming Xiong Richard Socher Georgia Tech Salesforce Research Princeton University {mchen393, tourzhao}@gatech.edu jasonlee@princeton.edu {yu.bai, huan.wang, cxiong, rsocher}@salesforce.com
Pseudocode	Yes	Algorithm 1 Learning with Neural Representations (Quad-Neural method) Input: Labeled data Sn, unlabeled data e Sn0, initializations V 2 RD d, b 2 RD, W0 2 Rm D, parameters (λ, ). Step 1: Construct model f Q W(x) = 1 2pm 0,rh(x))(w> r h(x))2, (Quad-Neural) where h(x) = b 1/2g(x) is the neural representation (4) (using e Sn0 to estimate the covariance). Step 2: Find a second-order stationary point c W of the regularized empirical risk (on the data Sn): W) := 1 n ni=1 (f Q W(xi), yi) + λ k Wk4 2.
Open Source Code	No	The paper does not contain any statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	No	We consider the standard supervised learning task, in which we receive n i.i.d. training samples Sn = {(xi, yi)}n i=1 from some data distribution D, where x 2 X is the input and y 2 Y is the label. In this paper, we assume that X = Sd 1 Rd (the unit sphere) so that inputs have unit norm kxk2 = 1. This describes a theoretical data setup and assumptions, not a specific publicly available dataset with access information.
Dataset Splits	No	The paper does not provide specific details about train/validation/test dataset splits, as it focuses on theoretical analysis rather than empirical experimentation with concrete datasets.
Hardware Specification	No	The paper is theoretical and does not describe experimental procedures that would require specific hardware. No hardware specifications were mentioned.
Software Dependencies	No	The paper focuses on theoretical analysis and does not describe specific software implementations. Therefore, no software dependencies with version numbers are provided.
Experiment Setup	No	The paper is theoretical and does not detail an experimental setup with specific hyperparameters, model initialization, or training schedules.