Learning Population-Level Diffusions with Generative RNNs

Authors: Tatsunori Hashimoto, David Gifford, Tommi Jaakkola

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the approach in the context of uncovering complex cellular dynamics known as the epigenetic landscape from existing biological assays.
Researcher Affiliation Academia Tatsunori B. Hashimoto THASHIM@MIT.EDU David K. Gifford DKG@MIT.EDU Tommi S. Jaakkola TOMMI@CSAIL.MIT.EDU
Pseudocode No The paper describes mathematical equations and algorithms in prose, but it does not contain a formally labeled pseudocode block or algorithm figure.
Open Source Code Yes We implement the entire method in Theano, and code is available at https://github. com/thashim/population-diffusions.
Open Datasets Yes In (Klein et al., 2015) an initially stable embryonic stem cell population (termed D0 for day 0) begins to differentiate after removal of LIF (leukemia inhibitory factor) and single-cell RNA-seq measurements are made at two, four, and seven days after LIF removal.
Dataset Splits No The paper describes the use of D0 and D7 data to predict D4 gene expression for RNA-seq data, implying a division of data. It also mentions "training log-likelihoods" for pre-training. However, it does not explicitly state specific percentages, counts, or a detailed methodology for splitting data into training, validation, or test sets in a way that allows reproduction of the splits (e.g., "80/10/10 split" or "standard train/test split from citation").
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions implementing the method "in Theano" and using "Adagrad" for optimization, but it does not provide specific version numbers for these or any other software dependencies, which would be necessary for reproducibility.
Experiment Setup Yes In practice, we set t to be 0.1 which gives at least a ten time-steps between observations in our experiments and find anywhere from five to hundred time-steps between observations to be sufficient. ... We solve this optimization problem with contrastive divergence (Hinton, 2002) using the first-order Euler scheme in Eq. 9 to generate negative samples. ... These stochastic gradients are then used in Adagrad to optimize Ψ(Duchi et al., 2011). Step-size is selected by grid search (see section S.3 for other parameter settings). σ is assumed known in the simulations, and fixed to the observed marginal variance for the RNA-seq data.