reproducibilityindex.ai

SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training

Authors: Kazem Meidani, Parshin Shojaee, Chandan K. Reddy, Amir Barati Farimani

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate SNIP across diverse tasks, including symbolic-to-numeric mathematical property prediction and numeric-to-symbolic equation discovery, commonly known as symbolic regression. Results show that SNIP effectively transfers to various tasks, consistently outperforming fully supervised baselines and competing strongly with established task-specific methods, especially in the low data regime scenarios where available data is limited.
Researcher Affiliation	Academia	1 Department of Mechanical Engineering, Carnegie Mellon University 2 Department of Computer Science, Virginia Tech 3 Machine Learning Department, Carnegie Mellon University
Pseudocode	No	The paper describes algorithms (e.g., 'More details on the LSO algorithm and implementation are in App. E.') but does not provide structured pseudocode or algorithm blocks within the main text.
Open Source Code	Yes	Code and model are available at: https://github.com/deep-symbolic-mathematics/ Multimodal-Math-Pretraining
Open Datasets	Yes	In our SNIP approach, pre-training relies on a vast synthetic dataset comprising paired numeric and symbolic data. We follow the data generation mechanism in (Kamienny et al., 2022), where each example consists of N data points (x, y) RD+1 and a corresponding mathematical function f, where y = f(x). ... SNIP was assessed on PMLB datasets (Olson et al., 2017) outlined in SRBench (La Cava et al., 2021), including: 119 Feynman equations (Udrescu & Tegmark, 2020), 14 ODE-Strogatz challenges (La Cava et al., 2016), and 57 Black-box regression tasks without known underlying functions.
Dataset Splits	No	The paper states: 'For a fair comparison, all model variants are trained on identical datasets comprising 10K equations and subsequently tested on a distinct 1K-equation evaluation dataset.' This specifies training and test sets but does not explicitly mention a validation set split.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper references various models and algorithms, but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in their implementation.
Experiment Setup	Yes	To assess property prediction on top of SNIP s embeddings, we employ a predictor head that passes these embeddings through a single-hidden-layer MLP to yield the predicted values. We adopt a Mean Squared Error (MSE) loss function for training on continuous properties. ... The training objective is to minimize the token-matching cross-entropy loss L... More details on the model designand training implementation can be found in App. E.