c++ - Confusing Memory Reordering Behavior -
i trying run simple task (obtaining x2apic id of current processor) on every available hardware thread. wrote following code this, works on machines tested on (see here complete mwe, compilable on linux c++11 code).
void print_x2apic_id() { uint32_t r1, r2, r3, r4; std::tie(r1, r2, r3, r4) = cpuid(11, 0); std::cout << r4 << std::endl; } int main() { const auto _ = std::ignore; auto nprocs = ::sysconf(_sc_nprocessors_onln); auto set = ::cpu_set_t{}; std::cout << "processors online: " << nprocs << std::endl; (auto = 0; != nprocs; ++i) { cpu_set(i, &set); check(::sched_setaffinity(0, sizeof(::cpu_set_t), &set)); cpu_clr(i, &set); print_x2apic_id(); } }
output on 1 machine (when compiled g++, version 4.9.0):
0 2 4 6 32 34 36 38
each iteration printed different x2apic id, things working expected. problems start. replaced call print_x2apic_id
following code:
uint32_t r4; std::tie(_, _, _, r4) = cpuid(11, 0); std::cout << r4 << std::endl;
this causes same id printed each iteration of loop:
36 36 36 36 36 36 36 36
my guess happened compiler noticed call cpuid
not depend on loop iteration (even though does). compiler "optimized" code hoisting call cpuid outside loop. try fix this, converted r4
atomic:
std::atomic<uint32_t> r4; std::tie(_, _, _, r4) = cpuid(11, 0); std::cout << r4 << std::endl;
this failed fix problem. surprisingly, does fix problem:
std::atomic<uint32_t> r1; uint32_t r2, r3, r4; std::tie(r1, r2, r3, r4) = cpuid(11, 0); std::cout << r4 << std::endl;
... ok, i'm confused.
edit: replacing asm
statement in cpuid
function asm volatile
fixes issue, don't see how should necessary.
my questions
- shouldn't inserting acquire fence before call
cpuid
, release fence after call cpuid sufficient prevent compiler performing memory reordering? - why didn't converting
r4
std::atomic<uint32_t>
work? , why did storing first 3 outputsr1
,r2
, ,r3
instead of ignoring them cause program work? - how can correctly write loop, using least amount of synchronization necessary?
i've reproduced problem optimization enabled (-o). right suspecting compiler optimization. cpuid serves full memory barrier (for processor) itself; compiler generates code without calling cpuid
function in loop since threats constant function. asm volatile
prevents compiler such optimization saying has side-effects.
see answer details: https://stackoverflow.com/a/14449998/2527797
Comments
Post a Comment