Risk Bounds for Transferring Representations With and Without Fine-Tuning
Authors: Daniel McNamara, Maria-Florina Balcan
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results motivate a practical approach to weight transfer, which we validate with experiments. We show that learning algorithms motivated by our theoretical results can help to overcome a scarcity of labeled target task data. |
| Researcher Affiliation | Academia | 1The Australian National University and Data61, Canberra, ACT, Australia 2Carnegie Mellon University, Pittsburgh, PA, USA. Correspondence to: Daniel Mc Namara <daniel.mcnamara@anu.edu.au>. |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide concrete access to its own source code for the methodology described. |
| Open Datasets | Yes | The MNIST and 20 Newgroups datasets are available at http://yann.lecun.com/exdb/mnist and http://qwone.com/~jason/20Newsgroups respectively. |
| Dataset Splits | No | The paper mentions `m S` and `m T` for labeled points but does not explicitly provide percentages or counts for training, validation, and test splits needed for full reproducibility of data partitioning. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory amounts, or processor types) used for running experiments are provided. |
| Software Dependencies | No | The paper mentions general tools or approaches (e.g., "Google word2vec package", "conjugate gradient optimization") but does not list specific software dependencies with version numbers needed for replication. |
| Experiment Setup | Yes | We use λ1(1) = λ2(2) = λ := 1, λ1(2) = λ2(1) = 0, m T = 500 and the sigmoid activation function. For MNIST we use raw pixel intensities, a 784 50 1 network and m S = 50000. For NEWSGROUPS we use TF-IDF weighted counts of most frequent words, a 2000 50 1 network and m S = 15000. We use conjugate gradient optimization with 200 iterations. |