Smart LLVM #1: Optimizing range checks | Egor Bogatov

Sometimes I explore LLVM sources and play with godbolt.org in order to find some interesting optimizations (not only the peephole ones) so I think I’ll post some here in my blog from time to time. Also, if an optimization is simple enough I try to implement it in RuyJIT.
And today I am going to share a nice LLVM trick to optimize some common range checks.
So, let’s say we have a function that checks if a char belongs to a list of reserved chars:
(I actually copy-pasted it from CoreFX)

1
2
3
4
5
6
7
8
9
10
11
12
bool IsReservedCharacter(char character) // uint16_t
{
    return character == ';'
        || character == '/'
        || character == ':'
        || character == '@'
        || character == '&'
        || character == '='
        || character == '+'
        || character == '$'
        || character == ',';
}

Now let’s compare outputs for RuyJIT and LLVM:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
; C# RuyJIT
  movzx    rax, cx
  cmp      eax, 59
  je       SHORT G_IG04
  cmp      eax, 47
  je       SHORT G_IG04
  cmp      eax, 58
  je       SHORT G_IG04
  cmp      eax, 64
  je       SHORT G_IG04
  cmp      eax, 38
  je       SHORT G_IG04
  cmp      eax, 61
  je       SHORT G_IG04
  cmp      eax, 43
  je       SHORT G_IG04
  cmp      eax, 36
  je       SHORT G_IG04 
  cmp      eax, 44
  sete     al
  movzx    rax, al
G_IG03:
  ret      
G_IG04:
  mov      eax, 1
G_IG05:
  ret

1
2
3
4
5
6
7
8
9
10
11
12
13
; LLVM
  add edi, -36
  cmp di, 28
  ja .LBB0_2
  mov al, 1
  movzx ecx, di
  mov edx, 314575237    
  bt rdx, rcx
  jae .LBB0_2
  ret
.LBB0_2:
  xor eax, eax
  ret

As you can see C# generated a pretty simple set of 9 cmp + jumps for each logical OR. LLVM, at the same time, generated something strange with magic numbers and just two branches. Let’s try to convert (disassemble) LLVM’s output to C#:

1
2
3
4
5
6
7
bool IsReservedCharacter(char c)
{
    c = (char)(c - 36);
    if (c > 28)
        return false;
    return ((314575237 >> c) & 1) == 1;
}

so insted of 9 cmp we have add, cmp, shr, and Let me explain the magic constants.
First, we need to convert chars to their ASCII numbers:

1
2
';' '/' ':' '@' '&' '=' '+' '$' ','
59  47  58  64  38  61  43  36  44

The biggest is @ (64) and the smallest is $ (36). So, the range starts from 36 and the length is 64 - 36 = 28. Thus the first if simply ignores all values outside of [36..64] range. Here is how I explained the first two magic numbers. Now it’s 314575237s turn:

Since the range is known and the length is 28 which easily fits into a 32/64bit CPU register we can encode it to a special bit-map (a set of 0 and 1) - a 32/64 bit integer (depending on a platform). Here is how it’s done:

1
2
3
long bitmap = 0;
foreach (char c in new [] { ';', '/', ':', '@', '&', '=', '+', '$', ',' })
    bitmap |= 1L << c - 36;

So, for each char we push (shift) 1 to the left according to c - 36 value (as you remember 36 stands for $ so its index will be zero - on the right)
and our bitmap becomes:

1
2
3
4
5
  
00010010110000000000100110000101 = 314575237
   |  | ||          |  ||    | |
   @  = ;:          /  ,+    & $

Now when we do 314575237 >> (c - 36) we either get 1 (symbol is one of the reserved) or 0 (doesn’t belong to the set)

Let’s benchmark it! I have a random string here and I need to calculate how many symbols are reserved:

1
2
3
4
5
string str = "Some link https://github.com/dotnet/coreclr/issues/12477, some@mail.com.";
int count = 0;
foreach (char c in str)
    if (IsReservedCharacter(c))
        count++;

The results are:

1
2
3
4
|                      Method |      Mean | Ratio |
|---------------------------- |----------:|------:|
| CountReserverCharacters_old |  197.6 ns |  1.43 |
| CountReserverCharacters_new |  138.4 ns |  1.00 |

The improved version is 43% faster! (Core i7 8700K)

Feature request for RuyJIT dotnet/coreclr#12477

LLVM opt: godbolt.org (convert to switch)
LLVM llc: godbolt.org (DAG*)