Controlling Neural Networks with Rule Representations

Authors: Sungyong Seo, Sercan Arik, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate DEEPCTRL on machine learning use cases from Physics, Retail, and Healthcare, where utilization of rules is particularly important. For the rule encoder (φr), data encoder (φd), and decision block (φ), we use MLPs with Re LU activation at intermediate layers, similarly to [5, 16]. We compare DEEPCTRL to the TASKONLY baseline, which is trained with fixed α = 0, i.e. it only uses a data encoder (φd) and a decision block (φ) to predict a next state. In addition, we include TASKONLY with rule regularization, TASK&RULE, based on Eq. 1 with a fixed λ. We make a comparison to Lagrangian Dual Framework (LDF) [12] that enforces rules by solving a constraint optimization problem.
Researcher Affiliation Industry Sungyong Seo, Sercan Ö. Arık, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister Google Cloud AI Sunnyvale, CA, USA {sungyongs,soarik,jinsungyoon,fancyzhx,kihyuks,tpfister}@google.com
Pseudocode Yes Algorithm 1 Training process for DEEPCTRL. Input: Training data D = {(xi, yi) : i = 1, , N}. Output: Optimized parameters Require: Rule encoder φr, data encoder φd, decision block φ, and distribution P(α). 1: Initialize φr, φd, and φ 2: while not converged do 3: Get mini-batch Db from D and αb R from P(α) 4: Get z = αbzr (1 αbzd) where zr = φr(Db) and zd = φd(Db). 5: Get ˆy = φ(z) and compute L = Eα P (α)[αLrule + ρ(1 α)Ltask] where ρ = Lrule,0/Ltask,0 6: Update φr, φd, and φ from gradients φr L, φd L, and φL 7: end while
Open Source Code No The paper does not provide an explicit statement or link to its own open-source code for the methodology described.
Open Datasets Yes For the rule encoder (φr), data encoder (φd), and decision block (φ), we use MLPs with Re LU activation at intermediate layers, similarly to [5, 16]. We compare DEEPCTRL to the TASKONLY baseline, which is trained with fixed α = 0, i.e. it only uses a data encoder (φd) and a decision block (φ) to predict a next state. In addition, we include TASKONLY with rule regularization, TASK&RULE, based on Eq. 1 with a fixed λ. We make a comparison to Lagrangian Dual Framework (LDF) [12] that enforces rules by solving a constraint optimization problem. ... Dataset, task and the rule: We focus on the task of sales forecasting of retail goods on the M5 dataset 1. While the original task is forecasting daily sales across different goods at every store, we change the task to be forecasting weekly sales since the prices of items are updated weekly. For this task, we consider the economics principle as the rule [6]: price-difference and sales-difference should have a negative correlation coefficient: r = SALES / PRICES < 0.0. ... 1https://www.kaggle.com/c/m5-forecasting-accuracy/
Dataset Splits No The paper mentions using a validation set for optimizing rule strength and comparing verification ratios (e.g., "Optimizing rule strength on validation set"), but it does not specify the exact split percentages or sample counts for training/validation/test sets needed for reproduction.
Hardware Specification No The paper does not specify the hardware used for the experiments (e.g., GPU/CPU models, memory details).
Software Dependencies No The paper mentions using MLPs with ReLU activation, but it does not provide specific software names with version numbers (e.g., TensorFlow, PyTorch, scikit-learn versions).
Experiment Setup Yes For the rule encoder (φr), data encoder (φd), and decision block (φ), we use MLPs with Re LU activation at intermediate layers, similarly to [5, 16]. ... We propose to sample from a Beta distribution (Beta(β, β)). We observe strong results with β = 0.1 in most cases ... before starting a learning process, we compute the initial loss values Lrule,0 and Ltask,0 on a training set and introduce a scale parameter ρ = Lrule,0/Ltask,0.