c++ - an optimal select function for vector extensions? -


opencl has select function, usable all-vector arguments. both clang , gcc support vector types well, gcc supports ternary operator supporting vectors , none of them has opencl-like select function yet. i've tried implement replacement, results non-optimal, both gcc , clang producing conditional jumps. however, gcc pretty job ternary operator, serves drop-in replacement. optimal solution select-like function exist (particularly under clang) , it? here non-optimal ideas:

template <typename u, typename v> inline v select(u const s, v const a, v const b) noexcept { // loop gives horrible results /*   constexpr u zero{}; // clang    return (-(s != zero) * a) - ((s == zero) * b); */   return v{     s[0] ? a[0] : b[0],     s[1] ? a[1] : b[1],     s[2] ? a[2] : b[2],     s[3] ? a[3] : b[3]   }; // 4-component vectors only, 1 generalize indices trick /*   return s ? : b; // gcc */ } 

this compiles semi-good code under both gcc , clang:

template <typename u, typename v> constexpr inline v select(u const s, v const a, v const b) noexcept {         return v((s & u(a)) | (~s & u(b))); } 

asm listing:

dump of assembler code function select<int __vector(4), float __vector(4)>(int __vector(4), float __vector(4), float __vector(4)):    0x00000000004009c0 <+0>:     andps  %xmm0,%xmm1    0x00000000004009c3 <+3>:     andnps %xmm2,%xmm0    0x00000000004009c6 <+6>:     orps   %xmm1,%xmm0    0x00000000004009c9 <+9>:     retq    end of assembler dump. 

this still not good, optimizer produce. way works abusing vector casts , fact, comparison operations produce -1 (all ones) integers in places comparison matches. @ least there no arithmetic operations necessary.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

python - Django-cities exits with "killed" -

python - How to get a widget position inside it's layout in Kivy? -