Risk Bounds for Transferring Representations With and Without Fine-Tuning

Authors: Daniel McNamara, Maria-Florina Balcan

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results motivate a practical approach to weight transfer, which we validate with experiments. We show that learning algorithms motivated by our theoretical results can help to overcome a scarcity of labeled target task data.
Researcher Affiliation Academia 1The Australian National University and Data61, Canberra, ACT, Australia 2Carnegie Mellon University, Pittsburgh, PA, USA. Correspondence to: Daniel Mc Namara <daniel.mcnamara@anu.edu.au>.
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide concrete access to its own source code for the methodology described.
Open Datasets Yes The MNIST and 20 Newgroups datasets are available at http://yann.lecun.com/exdb/mnist and http://qwone.com/~jason/20Newsgroups respectively.
Dataset Splits No The paper mentions `m S` and `m T` for labeled points but does not explicitly provide percentages or counts for training, validation, and test splits needed for full reproducibility of data partitioning.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory amounts, or processor types) used for running experiments are provided.
Software Dependencies No The paper mentions general tools or approaches (e.g., "Google word2vec package", "conjugate gradient optimization") but does not list specific software dependencies with version numbers needed for replication.
Experiment Setup Yes We use λ1(1) = λ2(2) = λ := 1, λ1(2) = λ2(1) = 0, m T = 500 and the sigmoid activation function. For MNIST we use raw pixel intensities, a 784 50 1 network and m S = 50000. For NEWSGROUPS we use TF-IDF weighted counts of most frequent words, a 2000 50 1 network and m S = 15000. We use conjugate gradient optimization with 200 iterations.