reproducibilityindex.ai

Conditional Tree Matching for Inference-Time Adaptation of Tree Prediction Models

Authors: Harshit Varma, Abhijeet Awasthi, Sunita Sarawagi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now present an empirical evaluation of CTREEOT both in terms of the quality of our proposed conditional tree matching score CTS and running time. We evaluate the quality of CTS by deploying it for inference-time adaptation of a real-life task of converting text utterances to SQL represented as an abstract relational tree.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Bombay. Correspondence to: Harshit Varma, Sunita Sarawagi <{harshitvarma, sunita}@cse.iitb.ac.in>.
Pseudocode	Yes	Algorithm 1 Tensorized CTREEOT
Open Source Code	Yes	The code for CTREEOT has been open-sourced1. 1https://github.com/hrshtv/CTreeOT
Open Datasets	Yes	Datasets We adapt a Text-to-SQL model to five different target schemas from the SPIDER dataset (Yu et al., 2018) without finetuning.
Dataset Splits	Yes	For training, we use SPIDER s train split containing 7000 Text-to-SQL examples from 140 schemas. For evaluation, we follow Awasthi et al. (2023) and use examples from the following five schemas from SPIDER s development set: {world 1, car 1, cre Doc Template Mgt, dog kennels, flight 2}. Examples from these schemas are excluded from the training and validation splits. The remaining 576 examples from the SPIDER s development set are used for validation.
Hardware Specification	Yes	These experiments were performed on a single NVIDIA RTX A6000 GPU and the algorithms were implemented in Py Torch.
Software Dependencies	No	These experiments were performed on a single NVIDIA RTX A6000 GPU and the algorithms were implemented in Py Torch.
Experiment Setup	Yes	We use ϵ = 10 3 and λ = 1. The beam is of size 30 and serves as our set of candidate trees Yx. Our relevance transformer consists of four transformer blocks with a fully-connected layer at the end to predict the scores. A single block is a stack of self-attention (8 heads), feedforward, and layer normalization layers. We keep the batch size as a multiple of the number of cases, and design the batches such that for a candidate tree, the remaining \|C\| 1 examples are from the same schema and act as cases. Our relevance transformer achieves an average F1 score of 77.1 on the validation split after being trained for 75 epochs.