PIDformer: Transformer Meets Control Theory

Authors: Tam Minh Nguyen, Cesar A Uribe, Tan Minh Nguyen, Richard Baraniuk

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate the model for advantages and robustness against baseline transformers across various practical tasks, including object classification, image segmentation, and language modeling.
Researcher Affiliation Academia 1Department of Electrical & Computer Engineering, Rice University, Houston, USA 2Department of Mathematics, National University of Singapore, Singapore.
Pseudocode No The paper presents mathematical formulations and a model architecture diagram (Figure 1), but no explicit pseudocode or algorithm blocks.
Open Source Code Yes For our language modeling implementation, we rely on the publicly available code https://github.com/IDSIA/lmtool-fwp developed by (Schlag et al., 2021).
Open Datasets Yes The Image Net dataset, as described in (Deng et al., 2009; Russakovsky et al., 2015)..., The ADE20K dataset..., The Wiki Text-103 dataset...
Dataset Splits Yes The Image Net dataset... consists of 1.28 million images for training and 50, 000 images for validation..., The ADE20K dataset... comprises a training set of 20, 210 images... Furthermore, the dataset includes 2, 000 images in the validation set..., The validation and test sets contain 218,000 and 246,000 words, respectively, divided into 60 articles per set and approximately 268,000 words each.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions using publicly available code from a GitHub repository but does not specify the versions of software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes Our baseline model is the Dei T-tiny model (Touvron et al., 2021), which consists of 12 transformer layers, 3 attention heads per layer, and a model dimension of 192. For model setting and setting and configuration, we follow (Touvron et al., 2021). Their implementation is available at https://github.com/facebookresearch/deit. The λP , λI, λD, and β used for our PID Dei T method is 0.8, 0.5, 0.05, and 0.1, respectively.