Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CTSketch: Compositional Tensor Sketching for Scalable Neurosymbolic Learning

Authors: Seewon Choi, Alaia Solko-Breslin, Rajeev Alur, Eric Wong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CTSketch on benchmarks from the neurosymbolic learning literature, including some designed for evaluating scalability. Our results show that CTSketch pushes neurosymbolic learning to new scales that were previously unattainable, with neural predictors obtaining high accuracy on tasks with one thousand inputs, despite supervision only on the final output.
Researcher Affiliation	Academia	Seewon Choi University of Pennsylvania EMAIL Alaia Solko-Breslin University of Pennsylvania EMAIL Rajeev Alur University of Pennsylvania EMAIL Eric Wong University of Pennsylvania EMAIL
Pseudocode	Yes	A Pseudocode Algorithm 1 CTSketch training algorithm
Open Source Code	Yes	Code is available at https://github.com/alaiasolkobreslin/CTSketch
Open Datasets	Yes	MNIST Sum. We consider the problem of computing the sum of handwritten digits (sumn) from the MNIST dataset [19]. Multi-digit Addition. We use the Multi-digit MNISTAdd task (addn), originally proposed by [25] Visual Sudoku. We use the Vi Sudo-PC dataset [1] containing 200 4x4 and 2K 9x9 filled boards for training and testing. Sudoku Solving. The goal of this task is to solve a 9x9 Sudoku puzzle, where the board is given as a sequence of MNIST images with the digit 0 representing an empty cell. We use the Sat Net [33] dataset with 9K training samples and 500 test samples and follow the same experimental setup as [4]. HWF. The Hand-Written Formula (HWF) task uses a dataset from [20] of 10K formulas of length 1 7 containing handwritten images of digits and operators. Leaf Identification and Scene Classification. We include two tasks from [31] which use GPT-4 to perform reasoning in the symbolic component for leaf identification and scene recognition, using datasets from [3] and [27] respectively.
Dataset Splits	Yes	Each task uses a training set of 5K samples and a testing set of 1K samples, except sum1024 where we use 4K training samples due to resource constraints. Multi-digit Addition. We use n {1, 2, 4, 15, 100} with a training set of 60,000/2n samples and a test set of 10,000/2n samples. Visual Sudoku. We use the Vi Sudo-PC dataset [1] containing 200 4x4 and 2K 9x9 filled boards for training and testing. Sudoku Solving. We use the Sat Net [33] dataset with 9K training samples and 500 test samples and follow the same experimental setup as [4].
Hardware Specification	Yes	We run all experiments on a machine with one 14-core Intel i9-10940X CPU, one NVIDIA RTX 3090 GPU, and 66 GB of RAM.
Software Dependencies	No	We used code from the official repositories of Scallop [15] (MIT), Deep Soft Log [23] (MIT), Inde Cate R [9] (Apache 2.0), ISED [31] (MIT), and A-Ne SI [32] (MIT). Additionally, for CTSketch, we used the implementation of TT-SVD from the python package tt-sketch [18] (CC BY-NC-ND 4.0) and cp, tucker, tensor ring decompositions from tensorly [17] (BSD 3-Clause).
Experiment Setup	Yes	Unless stated otherwise, we keep the optimizer, training epochs, and batch size consistent across methods, and use the best learning rate among {1e-3, 5e-4, 2e-4, 1e-4, 5e-5}. For neural-GPT experiments, leaf classification and scene recognition, we copy the model, prompt and configuration from the original paper [31] where the tasks were introduced. The hyperparameters used for CTSketch are summarized in Table 2.