Controlling Neural Networks with Rule Representations
Authors: Sungyong Seo, Sercan Arik, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate DEEPCTRL on machine learning use cases from Physics, Retail, and Healthcare, where utilization of rules is particularly important. For the rule encoder (φr), data encoder (φd), and decision block (φ), we use MLPs with Re LU activation at intermediate layers, similarly to [5, 16]. We compare DEEPCTRL to the TASKONLY baseline, which is trained with fixed α = 0, i.e. it only uses a data encoder (φd) and a decision block (φ) to predict a next state. In addition, we include TASKONLY with rule regularization, TASK&RULE, based on Eq. 1 with a fixed λ. We make a comparison to Lagrangian Dual Framework (LDF) [12] that enforces rules by solving a constraint optimization problem. |
| Researcher Affiliation | Industry | Sungyong Seo, Sercan Ö. Arık, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister Google Cloud AI Sunnyvale, CA, USA {sungyongs,soarik,jinsungyoon,fancyzhx,kihyuks,tpfister}@google.com |
| Pseudocode | Yes | Algorithm 1 Training process for DEEPCTRL. Input: Training data D = {(xi, yi) : i = 1, , N}. Output: Optimized parameters Require: Rule encoder φr, data encoder φd, decision block φ, and distribution P(α). 1: Initialize φr, φd, and φ 2: while not converged do 3: Get mini-batch Db from D and αb R from P(α) 4: Get z = αbzr (1 αbzd) where zr = φr(Db) and zd = φd(Db). 5: Get ˆy = φ(z) and compute L = Eα P (α)[αLrule + ρ(1 α)Ltask] where ρ = Lrule,0/Ltask,0 6: Update φr, φd, and φ from gradients φr L, φd L, and φL 7: end while |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own open-source code for the methodology described. |
| Open Datasets | Yes | For the rule encoder (φr), data encoder (φd), and decision block (φ), we use MLPs with Re LU activation at intermediate layers, similarly to [5, 16]. We compare DEEPCTRL to the TASKONLY baseline, which is trained with fixed α = 0, i.e. it only uses a data encoder (φd) and a decision block (φ) to predict a next state. In addition, we include TASKONLY with rule regularization, TASK&RULE, based on Eq. 1 with a fixed λ. We make a comparison to Lagrangian Dual Framework (LDF) [12] that enforces rules by solving a constraint optimization problem. ... Dataset, task and the rule: We focus on the task of sales forecasting of retail goods on the M5 dataset 1. While the original task is forecasting daily sales across different goods at every store, we change the task to be forecasting weekly sales since the prices of items are updated weekly. For this task, we consider the economics principle as the rule [6]: price-difference and sales-difference should have a negative correlation coefficient: r = SALES / PRICES < 0.0. ... 1https://www.kaggle.com/c/m5-forecasting-accuracy/ |
| Dataset Splits | No | The paper mentions using a validation set for optimizing rule strength and comparing verification ratios (e.g., "Optimizing rule strength on validation set"), but it does not specify the exact split percentages or sample counts for training/validation/test sets needed for reproduction. |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., GPU/CPU models, memory details). |
| Software Dependencies | No | The paper mentions using MLPs with ReLU activation, but it does not provide specific software names with version numbers (e.g., TensorFlow, PyTorch, scikit-learn versions). |
| Experiment Setup | Yes | For the rule encoder (φr), data encoder (φd), and decision block (φ), we use MLPs with Re LU activation at intermediate layers, similarly to [5, 16]. ... We propose to sample from a Beta distribution (Beta(β, β)). We observe strong results with β = 0.1 in most cases ... before starting a learning process, we compute the initial loss values Lrule,0 and Ltask,0 on a training set and introduce a scale parameter ρ = Lrule,0/Ltask,0. |