How RyuJIT inlines a function (heuristics)

Inlining is one of the most important optimizations. It eliminates a call overhead and exposes more opportunities for other optimizations (e.g. constant folding) and sometimes even makes callers smaller. Most people I’ve asked think that the jit inlines only small methods under a certain IL size threshold, e.g. < 32 bytes of IL and simply gives up on bigger functions. So, I decided to write this blog post and came up with a perfect example to cover several heuristics at once:

Take a guess - is this Volume constructor inlineable? Obviously, it’s not, it’s just too big. Especially because throw new is quite expensive and emits a lot of native code we don’t want to see in our callers. Let’s check codegen via Disasmo:

It’s inlined! And all the exceptions are completely eliminated! At this point you might think “Ah, ok, Jit is smart enough to make a full analyze of all basic-blocks/branches/locals and calculate the accurate control flow for constant arguments” or “Jit inlines EVERYTHING, runs the full cycle of optimizations and then decides if it’s profitable to inline or not”

Well… no, it’s not possible to do in a reasonable time. Jit only makes a few guesses (or observations) and estimates final native code size and performance impact. There are positive and negative observations, positive ones increase a special benefit multiplier, the bigger the multiplier - the more code we can inline. Negative observations might limit the benefit multiplier or just completely abort the whole optimization. So what observations did jit make for Volume..ctor inside Test?

We can see it in Disasmo (JitDump log):

All these simple observations set our multiplier to 11.5 and helped us to satisfy the inliner. E.g. the fact that we end up testing (==) constant arg 'B' (promotable struct) with another constant (e.g. 'C') gives us confidence that one of the branches will be optimized out and the native size will be smaller than estimated. Also, the fact that the method (constructor) is called inside a loop tells tje jit that it should try harder, etc.

The Jit also uses these and other observations to estimate the final codegen size and its performance impact via magic coefficients (ML?), see EstimateCodeSize() and EstimatePerformanceImpact().

Btw, did you notice this trick?:

if ((value - 'A') > ('Z' - 'A'))

it’s an optimized version of:

if (value < 'A' || value > 'Z')

Both are semantically the same but the latter consists of two expressions and the former is a single expression. It turns out the jit also has a limited amount of basic-blocks in a method it can inline and if it’s > 5 (see here) – no matter how big is the multiplier, it prints too many basic-blocks and aborts. That’s why I had to apply this 'Z' - 'A' trick. And I guess it’d be nice if both Roslyn and RyuJIT could automatically could do it for me:

Roslyn issue:
RyuJIT PR (my poor attempt):

And that’s why I think it makes sense to do the optimization in Roslyn:

Inlining and virtual methods

Obviously, we can’t inline virtual methods so that’s why RyuJIT needs more “devirtualization” optimizations (it already has some).

Inlining and “throw new”

If a method never returns - it’s most likely just a throw helper and should not be inlined (and the call should be marked as ‘rarely executed’). You can find a lot of ThrowHelpers in the BCL - it’s one of the first things they do for hot methods.

Inlining and [AggressiveInlining]

You basically strongly advice the jit to inline a method but it should be used carefully and most of the “I’ve added an AggressiveInlining here” PRs in BCL are simply rejected because of two reasons: 1) Inlining can negatively affect native code size (e.g. it optimizes for constant input and regresses other cases) 2) Inlining generates a lot of temp variables and the amount of these variables can easily hit the hard-coded limit of variables JIT can track (512) and you’ll see a lot of very slow spills, a perfect example is this tweet: or this issue

Inlining and DynamicMethod

It’s not currently supported, see this issue:
But if you think this can significantly optimize your code leave a comment there.

My attempt to make a heuristic

I tried to extend the existing heuristics in order to help the following case:

A few months ago I added an optimization to RyuJIT for "const str".Length to be replaced with a constant. So here ^ if we inline Validate into Test we’ll have if ("hello".Length > 10) and it will be optimized to just if (5 > 10) and the whole branch including throw new will be eliminated. But unfortunately in this case JIT refuses to inline:

And the main problem here is the fact that Jit doesn’t know we are going optimize get_Length and the inliner should aslo have a sort of constant string feeds get_Length, multiplier is increased to .. observation. Here is my attempt to add it The only problem here we don’t have time to resolve all callvirt CIL instructions to find out if it’s System.String.get_Length or not (see Andy’s comment).

There are a lot of other limitations, you can find some of them here. Also, I recommend you to read Andy Ayers’s thoughts about inliner’s design in general and his “Some Notes on Using Machine Learning to Develop Inlining Heuristics” article.