reproducibilityindex.ai

Deep Reinforcement Learning of Marked Temporal Point Processes

Authors: Utkarsh Upadhyay, Abir De, Manuel Gomez Rodriguez

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to ﬁnd interventions to help learners and marketers achieve their goals more effectively than alternatives.
Researcher Affiliation	Academia	Utkarsh Upadhyay MPI-SWS utkarshu@mpi-sws.org Abir De MPI-SWS ade@mpi-sws.org Manuel Gomez-Rodrizuez MPI-SWS manuelgr@mpi-sws.org
Pseudocode	Yes	Algorithm 1: Returns the next action time
Open Source Code	Yes	To facilitate research in temporal point processes within the reinforcement learning community at large, we are releasing an open-source implementation of our method in Tensor Flow as well as synthetic and real-world data used in our experiments.2
Open Datasets	Yes	To facilitate research in temporal point processes within the reinforcement learning community at large, we are releasing an open-source implementation of our method in Tensor Flow as well as synthetic and real-world data used in our experiments.2
Dataset Splits	No	The paper describes training and testing procedures, including dividing data into a training set and a test set (in Section 5.1), but does not explicitly mention a separate validation set or specific percentage/count splits for training, validation, and testing.
Hardware Specification	No	The paper does not specify any hardware details such as CPU, GPU models, or cloud computing resources used for the experiments.
Software Dependencies	No	The paper mentions "Tensor Flow" but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup	Yes	More speciﬁcally, on iteration i, we build a batch of b reviewing (or studying) sequences of time length T, where we sample student s recalls from the student model every time our policy pi generates a reviewing events and compute the reward at the end of each sequence. Here, the reward is the sampled recall at test time T + , which is a natural performance measure for the goal stated in the problem deﬁnition.