The critical section makes sure that only ever a single thread can execute the section at a time. So when a thread what's to execute the section, it first needs to make sure no other thread is executing it and potentially wait for the other threads to finish executing the section.
Reductions however don't induce this synchronization overhead, instead each thread executes with an independent parent
value, and after the loop is done, the reduction is applied to merge all parent
values. The following, is essentially what the #pragma omp parallel for reduction(min : parent)
is equivalent to:
unsigned int parents[8] = {v, v, v, v, v, v, v, v};
#pragma omp parallel for num_threads(8)
for(unsigned int j = 0; j < G.Out[v]._map.bucket_count(); j++) {
for(auto ite = G.Out[v]._map.begin(j); ite != G.Out[v]._map.end(j); ite++) {
unsigned int w = ite->first;
if(v > w)
parents[omp_get_thread_num()] = min(parents[omp_get_thread_num()],w);
}
}
unsigned int parent = v;
for (unsigned int i = 0; i < 8; ++i) {
parent = min(parent, parents[i]);
}