reproducibilityindex.ai

Scalable Neural Data Server: A Data Recommender for Transfer Learning

Authors: Tianshi Cao, Sasha (Alexandre) Doubov, David Acuna, Sanja Fidler

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate SNDS on a plethora of real world tasks and ﬁnd that data recommended by SNDS improves downstream task performance over baselines. We also demonstrate the scalability of SNDS by showing its ability to select relevant data for transfer outside of the natural image setting.
Researcher Affiliation	Collaboration	University of Toronto1 Vector Institute2 NVIDIA3 {jcao,doubovs,davidj,fidler}@cs.toronto.edu
Pseudocode	No	The paper describes the steps of its proposed method but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	This is proprietary information that we will not be releasing at this time.
Open Datasets	Yes	Source data We simulate data providers with partitions of the Open Images dataset[26]. Public data and expert training We use the training split of ILSVRC2012[37], containing 1.2 million images, as the public dataset. Target Datasets We use nine ﬁnegrained classiﬁcation datasets as target datasets. They are: FGVCAircraft [27], Stanford Cars [25], CUB200 [43], Stanford Dogs [24], DTD [13], Flowers102 [31], Food100 [9], Oxford Pets [33], and SUN397 [44].
Dataset Splits	No	The paper mentions using a 'held-out set' and refers to the supplementary material for 'training details' and 'data splits', but it does not explicitly provide the specific percentages or counts for train/validation splits within the main text.
Hardware Specification	Yes	Experiments are performed on a Tesla P100 with 12 GB memory in an internal cluster.
Software Dependencies	No	The paper states 'Experiments are implemented using Py Torch[34]' but does not provide a specific version number for PyTorch or any other software dependency.
Experiment Setup	Yes	Experts use Res Net18 as backbone with a input size of 224x224 and output dimension of 4 (corresponding to the 4 rotations). We use K = 50 in our experiments... We pre-train on the selected data using supervised learning, and then ﬁnetune on the downstream dataset.