reproducibilityindex.ai

Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem

Authors: Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Nithya Kemp, Eliana Cohen, Lu Li, Howie Choset

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments performed on the Market-1501 and Stanford Online Products datasets with various network architectures corroborate our theoretical findings, indicating that network collapse tends to happen when the batch size is too large or embedding dimension is too small.
Researcher Affiliation	Academia	Robotics Institute Carnegie Mellon University Pittsburgh, PA 15232, USA
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	Yes	Experiments performed on the Market-1501 and Stanford Online Products datasets with various network architectures corroborate our theoretical findings... Our experiments with the person re-identification dataset (Market-1501 Zheng et al. (2015)) ... We further support our predictions via experiments spanning three additional datasets (SOP Oh Song et al. (2016), CARS Krause et al. (2013), and CUB200 Wah et al. (2011))
Dataset Splits	No	The paper describes batch sampling methods and training steps, but does not provide specific train/validation/test dataset splits (percentages or counts) or reference standard splits for reproducibility beyond citing the datasets.
Hardware Specification	No	The paper mentions 'GPU specifications' and 'hardware constraints' (e.g., memory warnings), but does not provide specific hardware models (e.g., GPU, CPU models, or memory amounts) used for the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Here, we use a fixed embedding dimension of d = 128, train until step 40,000, and repeat each trial 3 times. The batch size P and K are varied on a grid P {2, 4, 8, 18} and K {2, 4, 8, 16, 32} for a total of 20 combinations. ... we first fix P = 8 and K = 4 for each batch and vary the embedding dimension d from 4 to 1024. The network architecture and number of training steps are the same as the previous experiment (Figure 5 (a)).