Reinforcement Learning with History Dependent Dynamic Contexts
Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our approach on a recommendation task (using Movie Lens data) where user behavior dynamics evolve in response to recommendations. To evaluate the effectiveness of DCZero, we develop a movie recommendation environment based on the Movie Lens dataset (Harper and Konstan, 2015). |
| Researcher Affiliation | Collaboration | 1Google Research 2CREST, ENSAE. |
| Pseudocode | Yes | Algorithm 1 LDC-UCB; Algorithm 2 Tractable LDC-UCB; Algorithm 3 DCZero |
| Open Source Code | No | The paper does not include any explicit statement about providing open-source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | To evaluate the effectiveness of DCZero, we develop a movie recommendation environment based on the Movie Lens dataset (Harper and Konstan, 2015). |
| Dataset Splits | No | All experiments used a horizon of H = 300, M = 6 user classes, A = 6 slate items (changing every reset), and a user embedding dimension of d = 20. The paper mentions dataset usage but does not provide details on train/validation/test splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions building on "Mu Zero" but does not provide specific software dependencies or their version numbers (e.g., Python, library versions). |
| Experiment Setup | Yes | All experiments used a horizon of H = 300, M = 6 user classes, A = 6 slate items (changing every reset), and a user embedding dimension of d = 20. We used default parameters for Mu Zero and applied the same parameters to DCZero. We also vary the values of α on the Attraction Env. |