A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints
Authors: Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation, and observe that we greatly improve upon the base model s ability to predict logically-consistent outputs. We also evaluate on the task of detoxifying large language models. |
| Researcher Affiliation | Academia | Kareem Ahmed Department of Computer Science University of California, Los Angeles ahmedk@cs.ucla.edu Kai-Wei Chang Department of Computer Science University of California, Los Angeles kwchang@cs.ucla.edu Guy Van den Broeck Department of Computer Science University of California, Los Angeles guyvdb@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1 LSL pseudo(α; pθ) 1: Input: Logical constraint α and model pθ. 2: Output: Pseudo-semantic loss of α w.r.t. θ 3: // Obtain sample y from pθ 4: y pθ 5: // Get sequence length and num. of categories 6: seq, cats = y.shape() 7: // Expand the batch to contain all perturbations 8: // of y that are a Hamming distance of 1 away 9: y = y.expand(seq, cats) 10: y[:, range(seq), :, range(seq)] = range(cats) 11: // Evaluate expanded samples through model 12: log pθ = pθ(y).log_softmax(dim= 1) 13: // Compute the conditional probabilities: 14: // log pθ[i][j] = log pθ(yj|y j) 15: log pθ = log pθ log pθ.logsumexp(dim= 1) 16: // Compute the probability of α under py 17: // by propagating the conditionals through cα 18: return log py(α) |
| Open Source Code | Yes | Our code is available at github.com/UCLA-Star AI/Pseudo SL. |
| Open Datasets | Yes | We use the dataset provided by Wang et al. [43], consisting of 10K Sudoku puzzles, split into 9K training examples, and 1K test samples, all puzzles having 10 missing entries. For this task, we follow the experimental setting set forth by [33], where our training set consists of 10, 000 terrain maps curated using Warcraft II tileset. Following previous work [15, 42], we evaluate on the REALTOXICITYPROMPTS, a dataset of almost 100k prompts ranging from nontoxic, assigned a toxicity score of 0, to very toxic, assigned a toxicity score of 1. |
| Dataset Splits | Yes | A randomized 10k portion of the Real Toxicity Prompts dataset was used to determine early stopping. |
| Hardware Specification | Yes | The experiments were run on a server with an AMD EPYC 7313P 16-Core Processor @ 3.7GHz, 2 NVIDIA RTX A6000, and 252 GB RAM. |
| Software Dependencies | No | The paper mentions software like PyTorch, Huggingface Accelerate, and Py SDD compiler, but does not provide specific version numbers for these components. For example, it states 'uses Py Torch [31]' without specifying the PyTorch version number. |
| Experiment Setup | Yes | We use a batch size of 16, a learning rate of 1e-5 with the Adam W optimizer [23] with otherwise default parameters. We did a grid search over the pseudo-semantic loss weight in the values {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 4, 8}. We used Adam with default Py Torch parameters and a learning rate of 3e-4. We did a grid search over the pseudo-semantic loss weight in the values {0.01, 0.05}. We used Adam with the default Py Torch parameters and a learning rate of 5e-4. We did a grid search over the pseudo-semantic loss weight in the values {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1}. |