Universal Approximation Under Constraints is Possible with Transformers

Authors: Anastasis Kratsios, Behnoosh Zamanlooy, Tianlin Liu, Ivan Dokmanić

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix 5, we show that the answer is indeed: Yes! , by proposing a training algorithm in that direction and showing that we outperform an MLP model and a classical transformer network in terms of a joint MSE and distance to the constraint set. The evaluation is performed on a large number of randomly generated experiments, whose objective is to reduce the MSE to a randomly generated function mapping a high-dimensional Euclidean space to there sphere R3 with outputs constrained to the sphere.
Researcher Affiliation Academia Anastasis Kratsios , Tianlin Liu & Ivan Dokmani c Universit at Basel, Departement Mathematik und Informatik {firstname.lastname}@unibas.ch Behnoosh Zamanlooy Universit at Z urich, Department of Informatics bzamanlooy@ifi.uzh.ch
Pseudocode No The paper provides detailed mathematical formulations of its components and theorems but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Anonymized. Pytorch implementation of attend-to-constraints, 2021. URL https://drive.google.com/file/d/1vryYsUmHt0fok3Mrje6oN9Tjs2Umpgk A/view.
Open Datasets No The evaluation is performed on a large number of randomly generated experiments, whose objective is to reduce the MSE to a randomly generated function mapping a high-dimensional Euclidean space to there sphere R3 with outputs constrained to the sphere. The paper uses randomly generated data for which no public access information (link, citation, etc.) is provided.
Dataset Splits No The paper mentions 'training data' and 'training algorithm' but does not provide specific details on how the dataset was split into training, validation, and test sets, or specific percentages/counts.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running its experiments.
Software Dependencies No The paper mentions 'Pytorch implementation' in its bibliography, but it does not specify version numbers for PyTorch or any other software dependencies needed for reproduction.
Experiment Setup No The paper refers to a 'training algorithm' and optimizing 'MSE' in Appendix 5, but it does not provide concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size) or specific optimizer settings.