Minimizing FLOPs to Learn Efficient Sparse Representations

Authors: Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen, Ning Xu, Pradeep Ravikumar, Barnabás Póczos

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our approach is competitive to the other baselines and yields a similar or better speed-vs-accuracy tradeoff on practical datasets1. We perform an empirical evaluation of our approach on the Megaface dataset (Kemelmacher-Shlizerman et al., 2016), and show that our proposed method successfully learns high-dimensional sparse embeddings that are orders-of-magnitude faster. We compare our approach to multiple baselines demonstrating an improved or similar speed-vs-accuracy trade-off.
Researcher Affiliation Collaboration Carnegie Mellon University, Moffett AI, Amazon {bparia,cjyeh,pradeepr,bapoczos}@cs.cmu.edu, ian.yan@moffett.ai, ningxu01@gmail.com
Pseudocode Yes Algorithm 1 Sparse Nearest Neighbour
Open Source Code Yes The implementation is available at https://github.com/biswajitsc/sparse-embed
Open Datasets Yes We evaluate our proposed approach on a large scale metric learning dataset: the Megaface (Kemelmacher-Shlizerman et al., 2016) used for face recognition. ... we train on a refined version of the MSCeleb-1M (Guo et al., 2016) dataset released by Deng et al. (2018) consisting of 1 million images spanning 85k classes.
Dataset Splits No The paper describes training on MSCeleb-1M and evaluating on Megaface/Facescrub, but does not explicitly specify a separate validation dataset split or its details.
Hardware Specification Yes All models were trained on 4 NVIDIA Tesla V-100 GPUs with 16G of memory. ... CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz.
Software Dependencies No The paper mentions 'Tensorflow (Abadi et al., 2016)' and 'C++' but does not specify exact version numbers for TensorFlow or other key software dependencies or libraries.
Experiment Setup Yes For the Arcloss function, we used the recommended parameters of margin m = 0.5 and temperature s = 64. We trained our models on 4 NVIDIA Tesla V-100 GPUs using SGD with a learning rate of 0.001, momentum of 0.9. Both the architectures were trained for a total of 230k steps, with the learning rate being decayed by a factor of 10 after 170k steps. We use a batch size of 256 and 64 per GPU for Mobile Face Net for Res Net respectively. ... The regularization parameter λ for the e F regularizer was varied as 200, 300, 400, 600. ... The PCA dimension is varied as 64, 96, 128, 256. ... For IVF-PQ from the faiss library, the following parameters were fixed: nlist=4096, M=64, nbit=8, and nprobe was varied as 100, 150, 250, 500, 1000.