reproducibilityindex.ai

Self-Supervised Visual Representation Learning from Hierarchical Grouping

Authors: Xiao Zhang, Michael Maire

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, beneﬁting downstream tasks. We additionally explore applications to semantic region search and video-based object instance tracking.
Researcher Affiliation	Academia	Xiao Zhang University of Chicago zhang7@uchicago.edu Michael Maire University of Chicago mmaire@uchicago.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We experiment on datasets of complex scenes, with variable numbers of object instances: PASCAL [14] and COCO [31]. ... The COCO-2014 [31] dataset provides instance and semantic segmentations for 81 foreground object classes on over 80K training images. ... We also benchmark learned embeddings on the DAVIS-2017 [40] dataset... We instead turn to structured edges (SE) [11], which only leverages the small supervised BSDS [34] for training. ... Image Net [9]
Dataset Splits	Yes	PASCAL provides 1464 and 1449 pixel-wise annotated images for training and validation, respectively. ... We evaluate learned embeddings on the PASCAL val set by training a pixel-wise classiﬁer for semantic segmentation on PASCAL train_aug, set atop frozen features.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper only mentions software by name (e.g., Adam, Deep Lab V3) without specific version numbers for dependencies.
Experiment Setup	Yes	We use Adam [23] to train our model for 80 epochs with batch size 70. We initialize learning rate as 1e-2 which is then decayed by 0.1 at 25, 45, 60 epochs, respectively. We perform data augmentation including random resized cropping, random horizontal ﬂipping, and color jittering on input images, which are then resized to 224x224 before being fed into the network. For one image, we randomly sample 7 regions and, for each region, sample 10 positive pixels and 5 negative pixels. We use σp = 0.8 for all experiments. In experiments ﬁne-tuning on PASCAL train_aug... Here, we use SGD with weight decay 5e-4 and momentum 0.9 to optimize the pixel-wise cross entropy loss for 20K iterations with batch size 20. We randomly crop and resize images to 384x384 patches. The learning rate starts at 0.03 and decays by 0.1 at 10K and 15K iterations.