reproducibilityindex.ai

Toward Understanding Privileged Features Distillation in Learning-to-Rank

Authors: Shuo Yang, Sujay Sanghavi, Holakou Rahmanian, Jan Bakus, Vishwanathan S. V. N.

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we ﬁrst study PFD empirically on three public ranking datasets and an industrial-scale ranking problem derived from Amazon s logs. We show that PFD outperforms several baselines (no-distillation, pretraining-ﬁnetuning, self-distillation, and generalized distillation) on all these datasets. Next, we analyze why and when PFD performs well via both empirical ablation studies and theoretical analysis for linear models.
Researcher Affiliation	Collaboration	UT Austin yangshuo ut@utexas.edu Sujay Sanghavi Amazon sujayrs@amazon.com Holakou Rahmanian Amazon holakou@amazon.com Amazon jbakus@amazon.com S.V.N. Vishwanathan Amazon vishy@amazon.com
Pseudocode	No	The paper describes the steps for Privileged Features Distillation in Section 3.1 (Step I and Step II), detailing the process. However, this description is in paragraph form and is not presented as a formally structured pseudocode block or algorithm.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material.
Open Datasets	Yes	We ﬁrst evaluate the performance of PFD on three widely used public ranking datasets. Speciﬁcally, we use the Set1 from Yahoo! Learn to rank challenge [CC11]; Istella Learning to Rank dataset [DLN+16]; and Microsoft Learning to Rank MSLR-Web30k dataset [QL13].
Dataset Splits	Yes	The validation set is from the training set, taking 10% from the training data for Yahoo, Istella, and Web30k.
Hardware Specification	No	The paper does not specify any particular hardware used for running its experiments, such as specific GPU models, CPU types, or cloud computing instances. The authors' checklist explicitly states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'.
Software Dependencies	No	The paper mentions using PyTorch for implementation and the Adam optimizer, and refers to specific loss functions like Rank BCE and Rank Net. However, it does not provide specific version numbers for PyTorch or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	The ranking model is a 5-layer fully connected neural network with hidden dimensions [256, 128, 64, 32, 16]. Adam optimizer [KB14] is used with learning rate 1e-4 and batch size 256. The training lasts for 100 epochs. The best checkpoint (measured by testing NDCG@8) on the validation dataset is used for evaluation.