reproducibilityindex.ai

Active Learning based Structural Inference

Authors: Aoran Wang, Jun Pang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate ALa SI on various large datasets including simulated systems and real-world networks, to demonstrate that ALa SI is able to outperform previous methods in precisely inferring the existence of connections in large systems under either supervised learning or unsupervised learning. We show with extensive experiments that ALa SI can infer the directed connections of dynamical systems with up to 1.5K agents with either supervised learning or unsupervised learning. We test ALa SI on seven different large dynamical systems, including simulated networks and real-world gene regulatory networks (GRNs).
Researcher Affiliation	Academia	1Faculty of Science, Technology and Medicine, University of Luxembourg, Luxembourg 2Institute for Advanced Studies, University of Luxembourg, Luxembourg. Correspondence to: Aoran Wang <aoran.wang@uni.lu>, Jun Pang <jun.pang@uni.lu>.
Pseudocode	Yes	We describe the pipeline of ALa SI in Figure 1 and Algorithm 2 in the appendix. Algorithm 2 Pipeline of ALa SI. Algorithm 3 PID Algorithm in ALa SI. Algorithm 4 Pipeline of learning in ALa SI. Algorithm 5 The Multi-layer-perceptron.
Open Source Code	Yes	We will make the implementation public on Git Hub. We will include the code of ALa SI, and the procedures for accessing the dataset we used in this work. We attach our pseudocode and implementation as the supplementary document to this paper.
Open Datasets	Yes	We first test our framework on physical simulations of spring systems, which is also mentioned in (Kipf et al., 2018). Moreover, we collect three real-world GRNs from literature, namely single cell dataset of embryonic stem cells (ESC) (Biase et al., 2014), a cutoff of Escherichia coli microarray data (E. coli) (Jozefczuk et al., 2010), and a cutoff of Staphylococcus aureus microarray data (S. aureus) (Marbach et al., 2012).
Dataset Splits	Yes	We collect the trajectories and randomly group them into three sets for training, validation and testing with the ratio of 8: 2: 2, respectively. (Section C.7, 'Spring Datasets'). randomly group the trajectories of gene expressions into three sets for training, validation and testing with the ratio of 8: 2: 2, respectively. (Section C.7, 'GRN Datasets')
Hardware Specification	Yes	We run experiments of ALa SI on a single NVIDIA Tesla V100 SXM2 graphic card, which has 32 GB graphic memory and 5120 NVIDIA CUDA Cores. And we ran these methods on four NVIDIA Tesla V100 SXM2 graphic cards, with a batch size of 128.
Software Dependencies	No	The paper mentions key software components like 'Py Torch (Paszke et al., 2019)' and 'Scikit-Learn (Pedregosa et al., 2011)' and references GitHub repositories for implementations of Deep Info Max and baselines. However, it does not provide specific version numbers for PyTorch, Scikit-learn, or other software libraries used in the experimental setup in a direct, clear statement within the text (e.g., 'PyTorch 1.9'). The years in citations imply a general version range but not a precise version.
Experiment Setup	Yes	During training, we set batch size as 64 for datasets which have less than 100 agents, for those equal or more than 100 agents, we set batch size as 16. We train our ALa SI model with 500 epochs for each updated label pool on every dataset. For all of the experiments, we train ALa SI with a learning rate of 0.0005. We have the following hyper-parameters: initial sample size m, query size K, number of epochs E, number of selection rounds N, variance σ of Ldc, weights α, β, γ, ξ in hybrid loss, and proportion of rank in PID η. We utilized grid search for the rough values of these hyper-parameters, and show them in Table 4.