Learning Population-Level Diffusions with Generative RNNs
Authors: Tatsunori Hashimoto, David Gifford, Tommi Jaakkola
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the approach in the context of uncovering complex cellular dynamics known as the epigenetic landscape from existing biological assays. |
| Researcher Affiliation | Academia | Tatsunori B. Hashimoto THASHIM@MIT.EDU David K. Gifford DKG@MIT.EDU Tommi S. Jaakkola TOMMI@CSAIL.MIT.EDU |
| Pseudocode | No | The paper describes mathematical equations and algorithms in prose, but it does not contain a formally labeled pseudocode block or algorithm figure. |
| Open Source Code | Yes | We implement the entire method in Theano, and code is available at https://github. com/thashim/population-diffusions. |
| Open Datasets | Yes | In (Klein et al., 2015) an initially stable embryonic stem cell population (termed D0 for day 0) begins to differentiate after removal of LIF (leukemia inhibitory factor) and single-cell RNA-seq measurements are made at two, four, and seven days after LIF removal. |
| Dataset Splits | No | The paper describes the use of D0 and D7 data to predict D4 gene expression for RNA-seq data, implying a division of data. It also mentions "training log-likelihoods" for pre-training. However, it does not explicitly state specific percentages, counts, or a detailed methodology for splitting data into training, validation, or test sets in a way that allows reproduction of the splits (e.g., "80/10/10 split" or "standard train/test split from citation"). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions implementing the method "in Theano" and using "Adagrad" for optimization, but it does not provide specific version numbers for these or any other software dependencies, which would be necessary for reproducibility. |
| Experiment Setup | Yes | In practice, we set t to be 0.1 which gives at least a ten time-steps between observations in our experiments and find anywhere from five to hundred time-steps between observations to be sufficient. ... We solve this optimization problem with contrastive divergence (Hinton, 2002) using the first-order Euler scheme in Eq. 9 to generate negative samples. ... These stochastic gradients are then used in Adagrad to optimize Ψ(Duchi et al., 2011). Step-size is selected by grid search (see section S.3 for other parameter settings). σ is assumed known in the simulations, and fixed to the observed marginal variance for the RNA-seq data. |