reproducibilityindex.ai

Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning

Authors: Timo Milbich, Karsten Roth, Samarth Sinha, Ludwig Schmidt, Marzyeh Ghassemi, Bjorn Ommer

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on our new benchmark, we conduct a thorough empirical analysis of state-of-the-art DML methods. We find that while generalization tends to consistently degrade with difficulty, some methods are better at retaining performance as the distribution shift increases.
Researcher Affiliation	Collaboration	Timo Milbich LMU Munich & IWR, Heidelberg University timo.milbich@iwr.uni-heidelberg.de Karsten Roth ,x IWR, Heidelberg University karsten.rh1@gmail.com Samarth Sinha University of Toronto, Vector sinhasam@fb.com Ludwig Schmidt University of Washington schmidt@cs.uw.edu Marzyeh Ghassemi MIT, University of Toronto, Vector mghassem@mit.edu Björn Ommer LMU Munich & IWR, Heidelberg University ommer@uni-heidelberg.de
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available here: https://github.com/CompVis/Characterizing_Generalization_in_DML
Open Datasets	Yes	We publish our code and train-test splits on three established benchmark sets, CUB2002011 [68], CARS196 [30] and Stanford Online Products (SOP) [43].
Dataset Splits	No	The paper mentions selecting splits and the number of splits (e.g., "eight total splits were investigated"), but it does not provide specific percentages or sample counts for training, validation, or test splits within the main text. It defers some details to the supplementary material.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU specifications, or memory amounts).
Software Dependencies	No	The paper mentions that "For a complete list of implementation and training details see the supplementary if not explicitly stated in the respective sections," but it does not list specific software dependencies with version numbers in the main text.
Experiment Setup	Yes	Training on CARS196 and CUB200-2011 was done for a maximum of 200 epochs following standard training protocols utilized in [54], while 150 epochs were used for the much larger SOP dataset. Additional training details if not directly stated in the respective sections can be found in the supplementary.