Deep Reinforcement Learning of Marked Temporal Point Processes
Authors: Utkarsh Upadhyay, Abir De, Manuel Gomez Rodriguez
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives. |
| Researcher Affiliation | Academia | Utkarsh Upadhyay MPI-SWS utkarshu@mpi-sws.org Abir De MPI-SWS ade@mpi-sws.org Manuel Gomez-Rodrizuez MPI-SWS manuelgr@mpi-sws.org |
| Pseudocode | Yes | Algorithm 1: Returns the next action time |
| Open Source Code | Yes | To facilitate research in temporal point processes within the reinforcement learning community at large, we are releasing an open-source implementation of our method in Tensor Flow as well as synthetic and real-world data used in our experiments.2 |
| Open Datasets | Yes | To facilitate research in temporal point processes within the reinforcement learning community at large, we are releasing an open-source implementation of our method in Tensor Flow as well as synthetic and real-world data used in our experiments.2 |
| Dataset Splits | No | The paper describes training and testing procedures, including dividing data into a training set and a test set (in Section 5.1), but does not explicitly mention a separate validation set or specific percentage/count splits for training, validation, and testing. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU, GPU models, or cloud computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions "Tensor Flow" but does not specify its version number or any other software dependencies with their respective versions. |
| Experiment Setup | Yes | More specifically, on iteration i, we build a batch of b reviewing (or studying) sequences of time length T, where we sample student s recalls from the student model every time our policy pi generates a reviewing events and compute the reward at the end of each sequence. Here, the reward is the sampled recall at test time T + , which is a natural performance measure for the goal stated in the problem definition. |