Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Volume-Aware Distance for Robust Similarity Learning

Authors: Shuo Chen, Chen Gong, Jun Li, Jian Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to evaluate the performance of our proposed method using real-world datasets. We first conduct ablation studies to reveal the usefulness of our newly introduced block/regularizer. Then we compare our proposed learning algorithm with existing state-of-the-art models in both the supervised metric learning and unsupervised contrastive learning tasks. Both the training and test processes are implemented on Pytorch (Paszke et al., 2019) with Tesla V100 GPUs, where the regularization parameter λ is set to 0.5. The dimensionality m and the parameter γ in Eq. (5) are set to 512 and 0.2, respectively. We use a 128-dimensional hidden layer for our H( ) in Eq. (4). The hyper-parameters of compared methods are set to the recommended values according to their original papers.
Researcher Affiliation	Academia	1School of Intelligence Science and Technology, Nanjing University, China 2School of Computer Science and Technology, Nanjing University of Science and Technology, China. Correspondence to: Shuo Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Solving Eq. (7) via SGD. Input: training set X = {xi}N i=1; step size η > 0; regularization parameter λ > 0; batch size n N+; randomly initialized φ(0); maximum iteration number T. For t from 1 to T: 1). Uniformly pick (n + 1) instances {xbj}n j=0 from X ; 2). Compute HLemp({bj}n j=1), HRexpand({bj}n j=1), φLemp({bj}n j=1) and φRexpand({bj}n j=1) according to Eq. (8) and Eq. (9); 3). Update the learning parameter: ( φ(t) = φ(t 1) η( φLemp+λ φRexpand); H(t) = H(t 1) η( HLemp+λ HRexpand); (10) End Output: converged φ(T ) and H(T ).
Open Source Code	No	The paper does not explicitly state that the source code for the methodology described is publicly available, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We conduct experiments to evaluate the performance of our proposed method using real-world datasets. [...] on CAR-196 (Krause et al., 2013), CUB-200 (Welinder et al., 2010), SOP (Oh Song et al., 2016), and In-Shop (Liu et al., 2016). [...] We employ CASIA-Web Face (Yi et al., 2014) as the training set while using Age DB30 (Moschoglou et al., 2017), CFP-FP (Sengupta et al., 2016), and Mega Face (Kemelmacher-Shlizerman et al., 2016) as the test sets. [...] We train our method on Image Net-100 and Image Net-1K (Russakovsky et al., 2015), and compare it with existing representative approaches including HCL (Robinson et al., 2021), PCL (Li et al., 2021), BYOL (Grill et al., 2020), GCA (Chen et al., 2024b), and INTL (Weng et al., 2024). [...] We also conduct the t-SNE embedding (Van der Maaten & Hinton, 2008) to obtain the 2-dimensional data points to better understand the usefulness of our introduced new component. In Fig. 4, VADSL (w/ measure-head H) can successfully obtain the better separability than the baseline result (w/o H), where the results of λ = 0.5 achieve very satisfactory separability. These results clearly demonstrate the crucial role of maintaining the measure-head network H along with the corresponding regularizer Rexpand in our approach. [...] We employ Res Net-50 as the backbone and integrate our method with Sim CLR (Chen et al., 2020) and Sw AV (Caron et al., 2020), yielding the results labeled as VADSL (cluster-free) and VADSL (cluster-used), respectively. [...] In this experiment, we use the STS dataset (Agirre et al., 2016) (including the tasks of STS12, STS13, STS14, STS15, and STS16). Following the approach in Sim CSE (Gao et al., 2021), we utilize pre-trained BERT (Devlin et al., 2018) checkpoints and compare our method with Infor Min-CL (Chen et al., 2022b), mis CSE (Klein & Nabi, 2022), PCL (Li et al., 2021), SCL (Wu et al., 2022b), and ADNCE (Wu et al., 2024). [...] We further evaluate our method on a challenging graph embedding task using biochemical-molecule data and social-network data, including DD, PTC, IMDBB, IMDB-M, RDT-B, PROTEINS, NCI1, and MUTAG (Yanardag & Vishwanathan, 2015). We use the representative method Info Graph (Sun et al., 2020a) as the baseline and perform downstream graph-level classification on these datasets. For evaluation, we fine-tune an SVM (Cortes & Vapnik, 1995) on the learned feature representations using 10-fold cross-validation. [...] We would like to further investigate the transferability of our method on the object detection and instance segmentation tasks. We first pre-train the model (with Res Net-50 backbone) on Image Net-1K, and then fine-tune the pre-trained backbone on the new dataset. Specifically, we select COCO (Lin et al., 2014) as our target dataset and follow the common setting (as discussed in Mo Co-v3 (Chen et al., 2021)) to fine-tune all layers of the pre-trained model over the train2017 set while evaluating the performance on the val2017 set. [...] For the Book Corpus dataset which includes six sub-tasks movie review sentiment (MR), product reviews (CR), subjectivity classification (SUBJ), opinion polarity (MPQA), question type classification (TREC), and paraphrase identification (MSRP), we follow the experimental settings in the baseline method quick-thought (QT) (Logeswaran & Lee, 2018) to choose the neighboring sentences as positive pairs.
Dataset Splits	Yes	For evaluation, we fine-tune an SVM (Cortes & Vapnik, 1995) on the learned feature representations using 10-fold cross-validation. The dataset is split into training, test, and validation sets in an 8/1/1 ratio. The accuracy results are reported after 10 runs.
Hardware Specification	Yes	Both the training and test processes are implemented on Pytorch (Paszke et al., 2019) with Tesla V100 GPUs, where the regularization parameter λ is set to 0.5. [...] Here we further provide experiments to record the training time of our method as well as the corresponding baseline method. Specifically, we use two NVIDIA Tesla V100 GPUs to train our method based on Sim CLR and Sw AV with 100 epochs, respectively.
Software Dependencies	No	The paper mentions "Pytorch (Paszke et al., 2019)" but does not specify a version number for Pytorch or any other software libraries, compilers, or operating systems.
Experiment Setup	Yes	Both the training and test processes are implemented on Pytorch (Paszke et al., 2019) with Tesla V100 GPUs, where the regularization parameter λ is set to 0.5. The dimensionality m and the parameter γ in Eq. (5) are set to 512 and 0.2, respectively. We use a 128-dimensional hidden layer for our H( ) in Eq. (4). The hyper-parameters of compared methods are set to the recommended values according to their original papers. [...] For the supervised task, we adopt different feature encoders (BN-Inception (Ioffe & Szegedy, 2015) for Npair (Sohn, 2016), and Res Net-50 (He et al., 2016) for Proxy Anchor (Kim et al., 2020) and Metric Former (Yan et al., 2022b)) to assess the performance of our method in metric learning. The results are presented in Tab. 1, where we record the test accuracy of all compared methods on CAR-196 (Krause et al., 2013) and CUB-200 (Welinder et al., 2010) datasets (with 500 epochs, learning rate = 10 3, and batch size = 512). [...] For all methods, we set the batch size to 256 and the embedding size to 512, using the Res Net-50 backbone. [...] The batch sizes are set to 1024 and 512 for Res Net-50 and Vi T-B/16 backbones, respectively.