Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts

Authors: Mohammad Sadil Khan, Sankalp Sinha, Talha Uddin, Didier Stricker, Sk Aziz Ali, Muhammad Zeshan Afzal

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of our model through a mixture of metrics, including visual quality, parametric precision, and geometrical accuracy. Our proposed framework shows great potential in AI-aided design applications. Our experimental analysis demonstrates superior performance over the two-stage baseline method adapted for the task at hand.
Researcher Affiliation Collaboration 1DFKI 2RPTU Kaiserslautern-Landau 3Mind Garage 4BITS Pilani, Hyderabad
Pseudocode No The paper describes the architecture and processes involved but does not include an explicit 'Pseudocode' or 'Algorithm' block.
Open Source Code No Project page is available at https://sadilkhan.github.io/text2cad-project/. Currently we have not published our code and dataset. As mentioned in the abstract, we will publish both of them soon.
Open Datasets Yes We use the Deep CAD [56] dataset which contains approximately 150k training CAD sequences and 8k test and validation sequences in sketch-and-extrude format. Deep CAD [56] dataset, a subset of ABC, and Fusion360 [55] provide CAD construction sequences in the form of sketch and extrusion to deduce design history.
Dataset Splits Yes We use the Deep CAD [56] dataset which contains approximately 150k training CAD sequences and 8k test and validation sequences in sketch-and-extrude format. For each sample in the dataset, four design prompts ranging from abstract to expert levels (L0, L1, L2, L3) are generated using our data annotation pipeline resulting in 600k training samples, and 32k test and validation samples.
Hardware Specification Yes The Text2CAD transformer has been trained with teacher-forcing [53] strategy for 160 epochs using 1 Nvidia A100-80GB GPU for 2 days. We use 1 Nvidia A100-40GB GPU to run LLa VA-Ne XT [26] and 4 Nvidia A100-80GB GPUs to run Mistral-50B [16].
Software Dependencies No The paper mentions models like BERT, Mistral-50B, and LLaVA-NeXT, but does not provide specific version numbers for ancillary software components, programming languages, or libraries used in the implementation.
Experiment Setup Yes Text2CAD transformer consists of L = 8 decoder blocks with 8 selfattention heads. The learning rate is 0.001 with Adam W [28] optimizer. Dropout is 0.1. Maximum number of word tokens, Np is fixed as 512 and CAD tokens Nc as 272. The dimension dp for the pre-trained Bert encoder [8] embedding T as well as Tadapt is 1024. The CAD sequence embedding d is 256. Following [19], the first two decoder blocks do not use any cross-attention operation between the text embedding and the CAD sequence embedding. The Text2CAD transformer has been trained with teacher-forcing [53] strategy for 160 epochs...