Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Large-Scale Distributed Learning via Private On-Device LSH
Authors: Tahseen Rabbani, Marco Bornstein, Furong Huang
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we (1) gauge the sensitivity of PGHash and (2) analyze the performance of PGHash and our own DWTA variant (PGHash-D) in training large-scale recommender networks. |
| Researcher Affiliation | Academia | Tahseen Rabbani Department of Computer Science University of Maryland EMAIL Marco Bornstein Department of Computer Science University of Maryland EMAIL Furong Huang Department of Computer Science University of Maryland EMAIL |
| Pseudocode | Yes | Algorithm 1 Distributed PGHash |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We use three extreme multi-label datasets for training recommender networks: Delicious-200K, Amazon-670K, and Wiki LSHTC-325K. These datasets come from the Extreme Classification Repository [4]. and the citation [4] is "Kush Bhatia, Kunal Dahiya, Himanshu Jain, Anshul Mittal, Yashoteja Prabhu, and Manik Varma. The extreme classification repository: Multi-label datasets and code. URL http://manikvarma. org/downloads/XC/XMLRepository. html, 2016." |
| Dataset Splits | No | The paper mentions 'test accuracy' and 'test sets' but does not provide explicit training, validation, and test dataset splits or specific methodologies for partitioning the data into these subsets. |
| Hardware Specification | Yes | These experiments are run on a cloud cluster using Intel Xeon Silver 4216 processors with 128GB of total memory. |
| Software Dependencies | No | Finally, we train our neural network using Tensor Flow. |
| Experiment Setup | Yes | Table 1: Hyper-parameters for Federated Experiments (PGHash and Federated SLIDE). Dataset Algorithm Hash Type LR Batch Size Steps per LSH k c Tables CR Delicious-200K PGHash PGHash 1e-4 128 1 8 8 50 1 |