reproducibilityindex.ai

Data-Efficient Learning with Neural Programs

Authors: Alaia Solko-Breslin, Seewon Choi, Ziyang Li, Neelay Velingker, Rajeev Alur, Mayur Naik, Eric Wong

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation shows that for the latter benchmarks, ISED has comparable performance to state-of-the-art neurosymbolic frameworks. For the former, we use adaptations of prior work on gradient approximations of black-box components as a baseline, and show that ISED achieves comparable accuracy but in a more dataand sample-efficient manner.
Researcher Affiliation	Academia	Alaia Solko-Breslin, Seewon Choi, Ziyang Li, Neelay Velingker, Rajeev Alur, Mayur Naik, Eric Wong University of Pennsylvania {alaia,seewon,liby99,neelay,alur,mhnaik,exwong}@seas.upenn.edu
Pseudocode	Yes	We present the pseudocode of the algorithm in Algorithm 1 and describe its steps with the hand-written formula task: Algorithm 1 ISED training pipeline
Open Source Code	Yes	1Code is available at https://github.com/alaiasolkobreslin/ISED. We release code for all baselines and ISED to reproduce the results reported in the paper.
Open Datasets	Yes	Leaf Classification. In this task, we use a dataset, which we call LEAF-ID, containing leaf images of 11 different plant species [4], containing 330 training samples and 110 testing samples. Scene Recognition. We use a dataset containing scene images from 9 different room types [16], consisting of 830 training examples and 92 testing examples. MNIST-R [13, 14] contains 11 tasks operating on inputs of images of handwritten digits from the MNIST dataset [11]. We use the Sat Net dataset consisting of 9K training samples and 500 test samples [25].
Dataset Splits	No	The paper provides specific training and testing splits for datasets like LEAF-ID (330 training, 110 testing), Scene Recognition (830 training, 92 testing), and MNIST-R tasks (5K training, 500 testing), but does not explicitly mention a distinct validation set or its size for most experiments. It only details 'training samples' and 'testing samples'.
Hardware Specification	Yes	All of our experiments were conducted on a machine with two 20-core Intel Xeon CPUs, one NVIDIA RTX 2080 Ti GPU, and 755 GB RAM.
Software Dependencies	No	The paper mentions 'Py Torch [17]', 'YOLOv8 [20]', and 'CLIP [19]' but does not provide their specific version numbers. It specifies GPT-4 versions as 'gpt-4-1106-preview and gpt-4o'. However, for other key software and baselines like Deep Prob Log, Scallop, A-Ne SI, NASR, and Inde Cate R, no specific version numbers are provided.
Experiment Setup	Yes	Unless otherwise noted, the sample count, i.e., the number of calls to the program P per training example, is fixed at 100 for all relevant methods. We use the Adam optimizer with the best learning rate among {1e 3, 5e 4, 1e 4}. We train for maximum 100 epochs, but stop early if the training saturates. For MNIST-R tasks, we used learning rate 1e 4 and trained ISED for 10 epochs, REINFORCE and Inde Cate R for 50 epochs, and A-Ne SI and NASR for 100 epochs.