Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GoTube: Scalable Statistical Verification of Continuous-Depth Models
Authors: Sophie A. Gruenbacher, Mathias Lechner, Ramin Hasani, Daniela Rus, Thomas A. Henzinger, Scott A. Smolka, Radu Grosu6755-6764
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Go Tube substantially outperforms state-of-the-art verification tools in terms of the size of the initial ball, speed, time-horizon, task completion, and scalability on a large set of experiments. |
| Researcher Affiliation | Academia | 1 TU Wien 2 IST Austria 3 CSAIL MIT 4 Stony Brook University |
| Pseudocode | Yes | Algorithm 1: Go Tube |
| Open Source Code | Yes | Code / Appendix: https://github.com/Daten Vorsprung/Go Tube |
| Open Datasets | No | The paper names standard benchmarks like 'Cart Pole-v1' and classical dynamical systems. While these are commonly understood to be public, the paper does not provide concrete access information (specific links, DOIs, or formal citations with authors/year) for these datasets/environments themselves within the text. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. The experiments are conducted on continuous dynamical systems and neural models, where the concept of data splits as in typical supervised learning tasks does not directly apply. |
| Hardware Specification | Yes | We run our evaluations on a standard workstation machine setup (12 v CPUs, 64GB memory) equipped with a single GPU for a per-run timeout of 1 hour (except for runtimes reported in Figure 4). |
| Software Dependencies | No | The paper mentions 'JAX' as an implementation tool but does not provide specific version numbers for JAX or any other software dependencies. It also refers to 'advanced automatic differential toolboxes'. |
| Experiment Setup | No | The paper mentions some general settings like 'ยต = 1.1 as the tightness factor' and '99% confidence level' but does not provide a comprehensive experimental setup, including specific hyperparameters (e.g., learning rates, batch sizes), model initialization details, or other system-level training settings. |