Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Looking into User’s Long-term Interests through the Lens of Conservative Evidential Learning

Authors: Dingrong Wang, Krishna Neupane, Ervine Zheng, Qi Yu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple real-world dynamic datasets demonstrate the state-of-the-art performance of ECQL and its capability to capture users long-term interests. In this paper, we propose a novel evidential conservative Q-learning framework (ECQL) that learns an effective and conservative recommendation policy by integrating evidence-based uncertainty and conservative learning. We conduct extensive experiments over four real-world datasets and compare with state-of-the-art baselines to demonstrate the effectiveness of the proposed model.
Researcher Affiliation Collaboration Dingrong Wang1, Krishna Prasad Neupane2, Ervine Zheng3, Qi Yu1 1Rochester Institute of Technology, 2Amazon, 3Samsung Research EMAIL
Pseudocode Yes Algorithm 1 Evidential Conservative Q-Learning
Open Source Code Yes The source code and processed datasets can be accessed here. https://github.com/ritmininglab/ECQL
Open Datasets Yes We conduct experiments on multiple real-world datasets: Movielens-1M, Movielens-100K, Netflix, and Yahoo! Music. Movielens-1M1: This dataset includes 1M ratings provided by 6,040 anonymous users... Movielens-100K2: This dataset contains 100,000 ratings from 943 users... Netflix (Bennett et al., 2007): This dataset has around 100 million interactions... Yahoo! Music rating (Dror et al., 2012): The dataset includes approximately 300,000 user-supplied ratings...
Dataset Splits Yes For training, given a user interaction history Hu, we continuously capture most recent N items after current time step into a sliding window Wt... We consider each user an episode for the RL setting and split users into 70% as training users and 30% as test users.
Hardware Specification Yes We implement the experiments based on the Py Torch framework with two A-100 GPUs.
Software Dependencies No We implement the experiments based on the Py Torch framework with two A-100 GPUs.
Experiment Setup Yes We set discounted factor γ = 1 and set τ = 3 as a threshold to identify if an item is positive, i.e., whether its ground-truth rating is larger than or equal to the threshold (ratingu,i τ). In testing, the agent may recommend items not interacted by the user. In such cases, we assign a neutral rating τ for those non-interacted items, where we set τ = 3. For training, we conduct 5 RL epochs, each with full training of all training users (episodes). Each epoch is equipped with an annealing λ ranging from 1 to 0.1 to adjust the emphasis from exploration to exploitation as the knowledge of training users increases. For testing, we conduct only one RL epoch containing all test users, and we use an annealing λ ranging from 0.5 to 0.1 across variable step sizes in four different (minimum session length) data sets.