Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space

Authors: Yaohua Wang, Yaobin Zhang, Fangyi Zhang, Senzhang Wang, Ming Lin, YuQi Zhang, Xiuyu Sun

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple public clustering datasets show that Ada-NETS significantly outperforms current state-of-the-art methods, proving its superiority and generalization.
Researcher Affiliation Collaboration Yaohua Wang , Yaobin Zhang *, Fangyi Zhang, Ming Lin, Yu Qi Zhang Alibaba Group {xiachen.wyh, zhangyaobin.zyb, zhiyuan.zfy, ming.l, gongyou.zyq}@alibaba-inc.com. Senzhang Wang Central South University szwang@csu.edu.cn. Xiuyu Sun Alibaba Group xiuyu.sxy@alibaba-inc.com
Pseudocode No The paper describes the algorithm using prose and mathematical equations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/damo-cv/Ada-NETS.
Open Datasets Yes Three datasets are used in the experiments: MS-Celeb-1M (Guo et al., 2016; Deng et al., 2019)... The clothes dataset Deep Fashion (Liu et al., 2016)... MSMT17 (Wei et al., 2018)... For a fair comparison, we follow the same protocol and features as VE-GCN (Yang et al., 2019) to divide the dataset evenly into ten parts by identities, and use part 0 as the training set and part 1 to part 9 as the testing set.
Dataset Splits No The paper specifies a training set ('part 0 as the training set') and a testing set ('part 1 to part 9 as the testing set') but does not explicitly mention a separate validation set split.
Hardware Specification Yes Empirically, Ada-NETS takes about 18.9 minutes to cluster part 1 test set (about 584k samples) on 54 E5-2682 v4 CPUs and 8 NVIDIA P100 cards.
Software Dependencies No The experiments are conducted with Py Torch (Paszke et al., 2019) and DGL (Wang et al., 2019a). The paper mentions the software used but does not specify their version numbers.
Experiment Setup Yes The learning rate is initially 0.01 for training the adaptive filter and 0.1 for training the GCN with cosine annealing. δ = 1 for Huber loss, β1 = 0.9, β2 = 1.0, λ = 1 for Hingle loss and β = 0.5 for Q-value. The SGD optimizer with momentum 0.9 and weight decay 1e-5 is used. k is set 80, 5, 40 on MS-Celeb-1M, Deep Fashion, and MSMT17.