Fast Offline Policy Optimization for Large Scale Recommendation
Authors: Otmane Sakhi, David Rohde, Alexandre Gilotte
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our algorithm is an order of magnitude faster than naive approaches yet produces equally good policies. |
| Researcher Affiliation | Collaboration | 1Criteo AI Lab, Paris, France 2CREST-ENSAE, IPP, Palaiseau, France {o.sakhi, d.rohde, a.gilotte}@criteo.com |
| Pseudocode | Yes | Algorithm 1: Fast Offline Policy Learning |
| Open Source Code | Yes | The source code1 to reproduce the results has all implementation details. 1https://github.com/criteo-research/fopo |
| Open Datasets | Yes | We chose two collaborative filtering datasets with large catalog sizes to validate our approach. the Twitch dataset (Rappaz, Mc Auley, and Aberer 2021) and the Good Reads user-books interaction dataset (Wan and Mc Auley 2018; Wan et al. 2019) |
| Dataset Splits | No | The paper mentions splitting the dataset into a train and test split, but does not explicitly detail a separate validation split or its proportions. |
| Hardware Specification | No | The paper mentions experiments were run on "CPU and GPU devices" and discusses speedups on "CPU machines" and with "GPU", but it does not specify any particular models (e.g., NVIDIA A100, Intel Xeon). |
| Software Dependencies | No | The paper mentions using "Pytorch", "Adam optimizer", "HNSW algorithm", and "FAISS library" but does not specify version numbers for any of these software components. |
| Experiment Setup | Yes | We opt for the Adam optimizer (Kingma and Ba 2014) with a batch size of 32 and a learning rate of 10 4 for all the experiments with the twitch dataset and 5.10 5 for the goodreads dataset. ... We plot the results of these runs on Figure 2. We observe that, even if REINFORCE has a much bigger time complexity per iteration (scales linearly on the catalog size), it does not outperform the optimization routines suggested by our approach. ... We fix K = 256 and S = 1000. ... embedding dimension set to L = 1000 ... The algorithms were run on the datasets considered for 50 epochs |