Scalable Neural Data Server: A Data Recommender for Transfer Learning
Authors: Tianshi Cao, Sasha (Alexandre) Doubov, David Acuna, Sanja Fidler
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate SNDS on a plethora of real world tasks and find that data recommended by SNDS improves downstream task performance over baselines. We also demonstrate the scalability of SNDS by showing its ability to select relevant data for transfer outside of the natural image setting. |
| Researcher Affiliation | Collaboration | University of Toronto1 Vector Institute2 NVIDIA3 {jcao,doubovs,davidj,fidler}@cs.toronto.edu |
| Pseudocode | No | The paper describes the steps of its proposed method but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | This is proprietary information that we will not be releasing at this time. |
| Open Datasets | Yes | Source data We simulate data providers with partitions of the Open Images dataset[26]. Public data and expert training We use the training split of ILSVRC2012[37], containing 1.2 million images, as the public dataset. Target Datasets We use nine finegrained classification datasets as target datasets. They are: FGVCAircraft [27], Stanford Cars [25], CUB200 [43], Stanford Dogs [24], DTD [13], Flowers102 [31], Food100 [9], Oxford Pets [33], and SUN397 [44]. |
| Dataset Splits | No | The paper mentions using a 'held-out set' and refers to the supplementary material for 'training details' and 'data splits', but it does not explicitly provide the specific percentages or counts for train/validation splits within the main text. |
| Hardware Specification | Yes | Experiments are performed on a Tesla P100 with 12 GB memory in an internal cluster. |
| Software Dependencies | No | The paper states 'Experiments are implemented using Py Torch[34]' but does not provide a specific version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | Experts use Res Net18 as backbone with a input size of 224x224 and output dimension of 4 (corresponding to the 4 rotations). We use K = 50 in our experiments... We pre-train on the selected data using supervised learning, and then finetune on the downstream dataset. |