Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem

Authors: Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Nithya Kemp, Eliana Cohen, Lu Li, Howie Choset

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments performed on the Market-1501 and Stanford Online Products datasets with various network architectures corroborate our theoretical findings, indicating that network collapse tends to happen when the batch size is too large or embedding dimension is too small.
Researcher Affiliation Academia Robotics Institute Carnegie Mellon University Pittsburgh, PA 15232, USA
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes Experiments performed on the Market-1501 and Stanford Online Products datasets with various network architectures corroborate our theoretical findings... Our experiments with the person re-identification dataset (Market-1501 Zheng et al. (2015)) ... We further support our predictions via experiments spanning three additional datasets (SOP Oh Song et al. (2016), CARS Krause et al. (2013), and CUB200 Wah et al. (2011))
Dataset Splits No The paper describes batch sampling methods and training steps, but does not provide specific train/validation/test dataset splits (percentages or counts) or reference standard splits for reproducibility beyond citing the datasets.
Hardware Specification No The paper mentions 'GPU specifications' and 'hardware constraints' (e.g., memory warnings), but does not provide specific hardware models (e.g., GPU, CPU models, or memory amounts) used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Here, we use a fixed embedding dimension of d = 128, train until step 40,000, and repeat each trial 3 times. The batch size P and K are varied on a grid P {2, 4, 8, 18} and K {2, 4, 8, 16, 32} for a total of 20 combinations. ... we first fix P = 8 and K = 4 for each batch and vary the embedding dimension d from 4 to 1024. The network architecture and number of training steps are the same as the previous experiment (Figure 5 (a)).