A Variant of Concurrent Constraint Programming on GPU

Authors: Pierre Talbot, Frédéric G Pinel, Pascal Bouvry3830-3839

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that TURBO obtains slightly better performance when compared to GECODE (Schulte, Tack, and Lagerkvist 2020), a parallel CPU-based solver. TURBO is only a proof of concept and several key components of modern constraint solvers are missing, in particular global constraints and constraint learning. Nevertheless, constraint decomposition of global constraints is an efficient approach that can be immediately used in TURBO (Narodytska 2011; Bessiere et al. 2009; Schutt et al. 2009). Still, it is clear that many improvements will be required to make TURBO a competitive solver. Evaluation We evaluate TURBO5 on the RCPSP problem introduced above.
Researcher Affiliation Academia Pierre Talbot, Fr ed eric Pinel, Pascal Bouvry Interdisciplinary Centre for Security, Reliability and Trust (Sn T) University of Luxembourg Esch-sur-Alzette L-4243, Luxembourg pierre.talbot@uni.lu, frederic.pinel@uni.lu, pascal.bouvry@uni.lu
Pseudocode Yes We define the propagation loop as a function void propagation(int tid, int stride, VStore& s, Vector<Propagator*>& props) where tid is a unique thread identifier, stride is the number of threads in the block, s is a store of interval variables and props is the array of propagators (which are PCCP processes): 1 __shared__ bool has_changed[3] = {true}; 2 for(int i = 1; !s.is_top() && has_changed[(i-1)%3]; ++i) { 3 for (int t = tid; t < props.size(); t += stride){ 4 if(props[t]->propagate(s)) 5 has_changed[i%3] = true; } 6 has_changed[(i+1)%3] = false; 7 __syncthreads(); 8 }
Open Source Code Yes Replicate: the version of TURBO used in this paper is available at https://github.com/ptal/turbo/tree/aaai2022.
Open Datasets Yes We experiment on 2 data sets: Patterson (Patterson 1984) (timeout of 5 minutes) which has instances with various numbers of tasks and resources, and j30 from PSPSLIB (Kolisch and Sprecher 1997) (timeout of 30 seconds) with 30 tasks and 4 resources.
Dataset Splits No The paper mentions using two datasets for experiments, Patterson and j30, but does not provide specific details on how these datasets were split into training, validation, and test sets. It does not specify percentages, sample counts, or refer to standard splits.
Hardware Specification Yes To illustrate our explanations, we take the example of the laptop GPU NVIDIA QUADRO RTX 5000 MAX-Q, which we later use to perform our experiments. It has 48 streaming multiprocessors (SMs), each with 64KB of L1 cache and 64 cores. There is a total of 48 * 64 = 3072 cores. We compare TURBO with the well-known constraint solver GECODE 6.2.0 (Schulte, Tack, and Lagerkvist 2020) in parallel mode on a processor i7-10750@2.60GHz with 6 cores and 12 threads.
Software Dependencies Yes We compare TURBO with the well-known constraint solver GECODE 6.2.0 (Schulte, Tack, and Lagerkvist 2020) in parallel mode on a processor i7-10750@2.60GHz with 6 cores and 12 threads.
Experiment Setup Yes To illustrate our explanations, we take the example of the laptop GPU NVIDIA QUADRO RTX 5000 MAX-Q... When programming in CUDA, we group parallel processes in blocks that are executed on a single SM, and have their own intra-block synchronization primitives... we have identified that multiplying by 4 the SMs to obtain 192 blocks and cores on SMs to obtain 256 threads yield the best efficiency on this particular GPU model. The constraint model and search strategy are the same for both solvers.