Introduce count{l,r}_{zero,one} for batch_bool#1269
Introduce count{l,r}_{zero,one} for batch_bool#1269serge-sans-paille merged 1 commit intoxtensor-stack:masterfrom
count{l,r}_{zero,one} for batch_bool#1269Conversation
|
I'm fine with the overall approach, but I think it means those operation should live in the Please ping me once you reach a green CI, and thanks for working on this 🙇 |
|
Right, the public |
9cf4926 to
8e73330
Compare
|
I would check for __cpp_lib_bitops and if it fails provide a custom popcount. |
|
Done. I was concerned that just trusting Footnotes
|
|
@serge-sans-paille CI is passing. |
06dac6b to
a3765aa
Compare
06f0171 to
70ff59e
Compare
In xtensor-stack#1236, it was mentioned that variable-sized bit groups for certain `batch_bool` reductions would be slightly more efficient than extracting a proper bitmask. To achieve this, the xsimd API is extended with the functions `xsimd::count{l,r}_{zero,one}`, and `count` is revised to allow per-platform kernels. The default implementations for each function simply apply the corresponding scalar operation (for which `__cpp_lib_bitops == 201907L` is partially backported) on `batch_bool::mask`. This is specialized for NEON(64) by instead applying the scalar operation to the narrowed batch, then scaling the result by the bit group size.
|
LGTM! Please just squash the history and we're good. Thanks a lot for your effort and... a question, if you don't mind: in which context are you using xsimd, and what for? |
70ff59e to
83914de
Compare
|
Squashed. Sorry, I am not at liberty to discuss the context at this time. |
|
That's totally fine! |
In #1236, it was mentioned that variable-sized bit groups for certain
batch_boolreductions would be slightly more efficient than extractinga proper bitmask. To achieve this, the xsimd API is extended with the
functions
xsimd::count{l,r}_{zero,one}, andcountis revised toallow per-platform kernels. The default implementations for each
function simply apply the corresponding scalar operation (for which
__cpp_lib_bitops == 201907Lis partially backported) onbatch_bool::mask. This is specialized for NEON(64) by insteadapplying the scalar operation to the narrowed batch, then scaling the
result by the "lane" size of the bit group size.