Recommender Systems with Generative Retrieval

Authors: Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, Maciej Kula, Ed Chi, Maheswaran Sathiamoorthy

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed framework on three public real-world benchmarks from the Amazon Product Reviews dataset [10], containing user reviews and item metadata from May 1996 to July 2014. In particular, we use three categories of the Amazon Product Reviews dataset for the sequential recommendation task: Beauty , Sports and Outdoors , and Toys and Games . We discuss the dataset statistics and pre-processing in Appendix C. Evaluation Metrics. We use top-k Recall (Recall@K) and Normalized Discounted Cumulative Gain (NDCG@K) with K = 5, 10 to evaluate the recommendation performance. Table 1: Performance comparison on sequential recommendation.
Researcher Affiliation Collaboration Shashank Rajput University of Wisconsin-Madison Nikhil Mehta Google Deep Mind Anima Singh Google Deep Mind Raghunandan Keshavan Google Trung Vu Google Lukasz Heldt Google Lichan Hong Google Deep Mind Yi Tay Google Deep Mind Vinh Q. Tran Google Jonah Samost Google Maciej Kula Google Deep Mind Ed H. Chi Google Deep Mind Maheswaran Sathiamoorthy Google Deep Mind
Pseudocode No The paper describes the process of RQ-VAE and the sequence-to-sequence model but does not provide explicit pseudocode or algorithm blocks.
Open Source Code No The paper mentions using the open-sourced T5X framework [28] and using source code for P5, but does not provide a statement or link for the open-sourcing of their own TIGER methodology's code.
Open Datasets Yes We evaluate the proposed framework on three public real-world benchmarks from the Amazon Product Reviews dataset [10], containing user reviews and item metadata from May 1996 to July 2014.
Dataset Splits Yes Following the standard evaluation protocol [17, 8], we use the leave-one-out strategy for evaluation. For each item sequence, the last item is used for testing, the item before the last is used for validation, and the rest is used for training. During training, we limit the number of items in a user s history to 20.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or specific cloud instance types) used for running its experiments.
Software Dependencies No We use the open-sourced T5X framework [28] to implement our transformer based encoder-decoder architecture.
Experiment Setup Yes The encoder has three intermediate layers of size 512, 256 and 128 with Re LU activation, with a final latent representation dimension of 32. To quantize this representation, three levels of residual quantization is done. For each level, a codebook of cardinality 256 is maintained, where each vector in the codebook has a dimension of 32. When computing the total loss, we use β = 0.25. The RQ-VAE model is trained for 20k epochs to ensure high codebook usage ( 80%). We use Adagrad optimizer with a learning rate of 0.4 and a batch size of 1024. We use 4 layers each for the transformer-based encoder and decoder models with 6 self-attention heads of dimension 64 in each layer. We used the Re LU activation function for all the layers. The MLP and the input dimension was set as 1024 and 128, respectively. We used a dropout of 0.1. Overall, the model has around 13 million parameters. We train this model for 200k steps for the Beauty and Sports and Outdoors dataset... We use a batch size of 256. The learning rate is 0.01 for the first 10k steps and then follows an inverse square root decay schedule.