Learning Temporal Point Processes via Reinforcement Learning

Authors: Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, Le Song

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments on various synthetic and real sequences of event data and showed that our approach outperforms the state-of-the-art regarding both data description and computational efficiency. 6 Experiments We evaluate our algorithm by comparing with state-of-the-arts on both synthetic and real datasets.
Researcher Affiliation Collaboration Shuang Li 1, Shuai Xiao 2, Shixiang Zhu1, Nan Du3, Yao Xie1, and Le Song1,2 1Georgia Institute of Technology 2Ant Financial 3Google Brain
Pseudocode Yes Algorithm RLPP: Mini-batch Reinforcement Learning for Learning Point Processes
Open Source Code No The paper provides links to open source codes for baseline methods (WGANTPP and RMTPP) in footnotes (e.g., 'https://github.com/xiaoshuai09/Wasserstein-Learning-For-Point-Process' and 'https://github.com/dunan/Neural Point Process'), but does not provide specific access to the source code for their own proposed method, RLPP.
Open Datasets Yes Medical Information Mart for Intensive Care III (MIMIC-III) contains de-identified clinical visit records from 2001 to 2012 for more than 40,000 patients.
Dataset Splits No The paper does not explicitly provide specific dataset split information (e.g., percentages or sample counts for train/validation/test sets). It describes batch size for training but not data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments.
Software Dependencies No The paper mentions software like 'Tensorflow' and 'GPy package' but does not specify their version numbers.
Experiment Setup Yes The policy in our method RLPP is parameterized as LSTM with 64 hidden neurons, and π(a|Θ(h)) is chosen to be exponential distribution. Batch size is 32 (the number of sampled sequences L and M are 32 in Algorithm 1, and learning rate is 1e-3. We use Gaussian kernel k(t, t ) = exp( t t 2/σ2) for the reward function. The kernel bandwidth σ is estimated using the median trick based on the observations [13].