Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Unifying Generative and Dense Retrieval for Sequential Recommendation
Authors: Liu Yang, Fabian Paischer, Kaveh Hassani, Jiacheng Li, Shuai Shao, Zhang Gabriel Li, Yun He, Xue Feng, Nima Noorshams, Sem Park, Bo Long, Robert D Nowak, Xiaoli Gao, Hamid Eghbalzadeh
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To address this, we compare these two approaches under controlled conditions on academic benchmarks and observe performance gaps, with dense retrieval showing stronger ranking performance, while generative retrieval provides greater resource efficiency. ... Experimental Setup and Results ... Table 2: Performance Comparison Across Baseline Methods on Amazon Beauty, Sports, Toys, and Steam Datasets. ... Figure 8: Ablation Results on Recall@10 across Datasets. |
| Researcher Affiliation | Collaboration | Liu Yang University of Wisconsin-Madison AI at Meta Fabian Paischer ELLIS Unit, LIT AI Lab, Institute for Machine Learning, JKU Linz, Austria AI at Meta Kaveh Hassani, Jiacheng Li, Shuai Shao, Zhang Gabriel Li, Yun He, Xue Feng, Nima Noorshams, Sem Park, Bo Long AI at Meta Robert D Nowak University of Wisconsin-Madison Xiaoli Gao AI at Meta Hamid Eghbalzadeh EMAIL AI at Meta |
| Pseudocode | Yes | Algorithm 1: Inference Process |
| Open Source Code | Yes | 1Code is available at https://github.com/facebookresearch/liger. |
| Open Datasets | Yes | Amazon Beauty, Sports, and Toys (He & Mc Auley, 2016): We use the Amazon Review dataset (2014), focusing on three categories: Beauty, Sports and Outdoors, and Toys and Games. Steam (Kang & Mc Auley, 2018b): The dataset comprises online reviews of video games, from which we extract relevant attributes to construct item embeddings. |
| Dataset Splits | Yes | For dataset splitting, we adopt the leave-one-out strategy following (Kang & Mc Auley, 2018b; Zhou et al., 2020; Rajput et al., 2024), designating the last item as the test label, the preceding item for validation, and the remainder for training. ... We evaluate LIGER on four datasets, preprocessing them using the standard 5-core filtering method (Zhang et al., 2019; Zhou et al., 2020). |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA RTX 4090 GPU with 24GB of memory, which allows us to test up to approximately 1.2 million items fully loaded into GPU memory. |
| Software Dependencies | No | When generating the item text representations, the item attributes are processed using the sentence-T5 model Ni et al. (2021) (XXL). For the generative model, we utilize the T5 (Raffel et al., 2020) encoder-decoder model... We use the Adam W optimizer... The paper mentions specific ML models/architectures and an optimizer, but does not provide specific version numbers for underlying software libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | The RQ-VAE features three levels of learnable codebooks, each with a dimension of 128 and a cardinality of 256. We use the Adam W optimizer to train the RQ-VAE, setting the learning rate at 0.001 and the weight decay at 0.1. ... For the generative model, we utilize the T5 (Raffel et al., 2020) encoder-decoder model, configuring both the encoder and decoder with 6 layers, an embedding dimension of 128, 6 heads, and a feed-forward network hidden dimension of 1024. The dropout rate is 0.2. ... We use the Adam W optimizer with a learning rate of 0.0003, a weight decay parameter of 0.035, and a cosine learning rate scheduler. ... For the cosine similarity loss calculation, we set the temperature parameter τ = 0.07. |