Skip to main content

XOR (Exclusive OR) for branchless coding

The following example shows the array reversing using the XOR operator. No need to take any additional variable to reverse the array. 
int main(int argc, _TCHAR* argv[])
{
    char str[] = "I AM STUDENT";
    int length = strlen(str);
    for(int i = 0; i < ((length/2)); i++)
    {
        str[i] ^= str[length - (1+i)];
        str[length - (1+i)] ^= str[i];
        str[i] ^= str[length - (1+i)];
    }
    cout << str << endl;
    return 0;
}

The above example is one of the uses of XOR but XOR comes in handy when we can do branchless coding methods like butterfly switch etc. Sometimes this is very effective in speeding up the execution. Let's see one of the uses of XOR in branchless coding. I am taking a simple example of Y = | X |. Yes, I am generating abs of a supplied number. So, my function signature/definition in C++ looks like below:

int absoluteBranch(int x)
{
    if (x < 0)
{
        return -x;
    }
    else
{
        return x;
    }
}
From the C++ code, we can see the branching in code and until runtime, we can't 
definitely say which branch could be executed. If we look at the assembly generated 
by the code also shows branching (Without optimization):
absoluteBranch(int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        cmp     DWORD PTR [rbp-4], 0
        jns     .L4
        mov     eax, DWORD PTR [rbp-4]
        neg     eax
        jmp     .L5
.L4:
        mov     eax, DWORD PTR [rbp-4]
.L5:
        pop     rbp

        ret

We have instructions like cmp(compare), jns(jump if not signed) and
jmp (unconditional jump). With the help of XOR we can completely remove
this branching of code while calculating abosolute of a signed number.

int absoluteBranchless(int x)
{
    int y = x >> (sizeof(int) * 8 - 1);
    return (x ^ y) - y;
}

Now I don't have any branch in the C++ code and assembly generated out of
it also branchless. Here goes the assembly (Without optimization):

absoluteBranchless(int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-20], edi
        mov     eax, DWORD PTR [rbp-20]
        sar     eax, 31
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-20]
        xor     eax, DWORD PTR [rbp-4]
        sub     eax, DWORD PTR [rbp-4]
        pop     rbp
        ret


The SAR instruction shifts the MSB and it became all FFs. For positive number it's all 00s.
So for negative number mask became -1 (in two's complement) and for positive number
it will always be 0. We then XOR the number with mask and subtract with mask, which
effectively adds +1 in case of negative number and +0 for positive number.
In above example variable y is the mask value.

It's nice, we have no branching!

Comments

Remember you have shown this in office and we had some discussions on this as well.

If you cannot afford any extra memory for a new array then this XOR-ing is a good solution.I believe its used in hardware which cannot afford huge memory.

Popular posts from this blog

Variadic template class to add numbers recursively during compilation

 The idea of having a class to add numbers (variable parameters) during compilation time recursively. Also wanted to restrict types to a single type while sending parameters to class member function. That said, if we mix int, float and double types to add function shall result in compilation error. How do we achieve this. The below is the code which actually helps to achieve this: <code> #include < fmt/format.h > template < typename T> class MyVarSumClass{     private :         T _sum = 0 ;     public :         template < typename ... TRest>         T add(T num, TRest... nums){             static_assert (std::conjunction<std::is_same<TRest, T>...>{}); /* Assert fails                if types are different */             _sum += num;             return add(nums...); // Next parameter packs gets picked recursively         }         // Base case         T add(T num){             _sum += num;             return _sum;         } }; int main() {     My

A simple approach to generate Fibonacci series via multi-threading

T his is a very simple approach taken to generate the Fibonacci series through multithreading. Here instead of a function, used a function object. The code is very simple and self-explanatory.  #include <iostream> #include <mutex> #include <thread> class Fib { public:     Fib() : _num0(1), _num1(1) {}     unsigned long operator()(); private:     unsigned long _num0, _num1;     std::mutex mu; }; unsigned long Fib::operator()() {     mu.lock(); // critical section, exclusive access to the below code by locking the mutex     unsigned long  temp = _num0;     _num0 = _num1;     _num1 = temp + _num0;     mu.unlock();     return temp; } int main() {     Fib f;          int i = 0;     unsigned long res = 0, res2= 0, res3 = 0;     std::cout << "Fibonacci series: ";     while (i <= 15) {         std::thread t1([&] { res = f(); }); // Capturing result to respective variable via lambda         std::thread t2([&] { res2 = f(); });         std::thread t3(

A concept to a product (Kimidori [ 黄緑]) - Part 2

In the previous part , we have seen KIMIDORI [ 黄緑] detect if a URL is malicious. In this part, we will see the details that KIMIDORI [ 黄緑] fetches out of the URL provided. As an example, provided a safe URL, https://www.azuresys.com/, and let's see what it brings out: As we can see, the link is safe and the link is active, which means we can just click on the link to open it on IE.  Now it's time to look into the URL report (still under development):  We have URLs IP, Location, and HTTP Status code. The Report part is a sliding window, the Show Report button shows as well as hides the report. Show / Hide Report is a toggle button. Let's see if we get the same details for any bad (phishing / malicious) URL: Took an URL example from a phishing link and tested it. The tool detected it as not a good link (Screen Shot Below) & link does not activate unlike a safe URL: Now let's see the report part for more details including domain registration details: It looks like it&