Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem
Authors: Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Nithya Kemp, Eliana Cohen, Lu Li, Howie Choset
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments performed on the Market-1501 and Stanford Online Products datasets with various network architectures corroborate our theoretical findings, indicating that network collapse tends to happen when the batch size is too large or embedding dimension is too small. |
| Researcher Affiliation | Academia | Robotics Institute Carnegie Mellon University Pittsburgh, PA 15232, USA |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | Experiments performed on the Market-1501 and Stanford Online Products datasets with various network architectures corroborate our theoretical findings... Our experiments with the person re-identification dataset (Market-1501 Zheng et al. (2015)) ... We further support our predictions via experiments spanning three additional datasets (SOP Oh Song et al. (2016), CARS Krause et al. (2013), and CUB200 Wah et al. (2011)) |
| Dataset Splits | No | The paper describes batch sampling methods and training steps, but does not provide specific train/validation/test dataset splits (percentages or counts) or reference standard splits for reproducibility beyond citing the datasets. |
| Hardware Specification | No | The paper mentions 'GPU specifications' and 'hardware constraints' (e.g., memory warnings), but does not provide specific hardware models (e.g., GPU, CPU models, or memory amounts) used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Here, we use a fixed embedding dimension of d = 128, train until step 40,000, and repeat each trial 3 times. The batch size P and K are varied on a grid P {2, 4, 8, 18} and K {2, 4, 8, 16, 32} for a total of 20 combinations. ... we first fix P = 8 and K = 4 for each batch and vary the embedding dimension d from 4 to 1024. The network architecture and number of training steps are the same as the previous experiment (Figure 5 (a)). |