Reinforcement Learning with History Dependent Dynamic Contexts

Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach on a recommendation task (using Movie Lens data) where user behavior dynamics evolve in response to recommendations. To evaluate the effectiveness of DCZero, we develop a movie recommendation environment based on the Movie Lens dataset (Harper and Konstan, 2015).
Researcher Affiliation Collaboration 1Google Research 2CREST, ENSAE.
Pseudocode Yes Algorithm 1 LDC-UCB; Algorithm 2 Tractable LDC-UCB; Algorithm 3 DCZero
Open Source Code No The paper does not include any explicit statement about providing open-source code for the described methodology or a link to a code repository.
Open Datasets Yes To evaluate the effectiveness of DCZero, we develop a movie recommendation environment based on the Movie Lens dataset (Harper and Konstan, 2015).
Dataset Splits No All experiments used a horizon of H = 300, M = 6 user classes, A = 6 slate items (changing every reset), and a user embedding dimension of d = 20. The paper mentions dataset usage but does not provide details on train/validation/test splits for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions building on "Mu Zero" but does not provide specific software dependencies or their version numbers (e.g., Python, library versions).
Experiment Setup Yes All experiments used a horizon of H = 300, M = 6 user classes, A = 6 slate items (changing every reset), and a user embedding dimension of d = 20. We used default parameters for Mu Zero and applied the same parameters to DCZero. We also vary the values of α on the Attraction Env.