Learning multiple visual domains with residual adapters

Authors: Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed architectures are thoroughly evaluated empirically (section 5). To this end, our second contribution is to introduce the visual decathlon challenge (fig. 1 and section 4), a new benchmark for multiple-domain learning in image recognition.
Researcher Affiliation Academia Sylvestre-Alvise Rebuffi1 Hakan Bilen1,2 Andrea Vedaldi1 1 Visual Geometry Group University of Oxford {srebuffi,hbilen,vedaldi}@robots.ox.ac.uk 2 School of Informatics University of Edinburgh
Pseudocode No The paper describes architectural components and processes but does not include any formal pseudocode or algorithm blocks.
Open Source Code No We are planning to make the data and an evaluation server public soon.
Open Datasets Yes The decathlon challenge combines ten well-known datasets from multiple visual domains: FGVC-Aircraft Benchmark [24]...CIFAR100 [19]...Daimler Mono Pedestrian Classification Benchmark (DPed) [26]...Describable Texture Dataset (DTD) [7]...The German Traffic Sign Recognition (GTSR) Benchmark [36]...Flowers102 [28]...ILSVRC12 (Im Net) [32]...Omniglot [20]...The Street View House Numbers (SVHN) [27]...UCF101 [35].
Dataset Splits Yes For each dataset, we specify a training, validation and test subsets. ... For the rest, we use 60%, 20% and 20% of the data for training, validation, and test respectively. For the ILSVRC12, since the test labels are not available, we use the original validation subset as the test subset and randomly sample a new validation set from their training split.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions using 'Res Nets' but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup Yes In all experiments we choose to use the powerful Res Nets [13] as base architectures...we chose the Res Net28 model [40] which consists of three blocks of four residual units. Each residual unit contains 3 3 convolutional, BN and Re LU modules (fig. 2). The network accepts 64 64 images as input, downscales the spatial dimensions by two at each block and ends with a global average pooling and a classifier layer followed by a softmax. We set the number of filters to 64, 128, 256 for these blocks respectively. Each network is optimized to minimize its cross-entropy loss with stochastic gradient descent. The network is run for 80 epochs and the initial learning rate of 0.1 is lowered to 0.01 and then 0.001 gradually.