Compilers, especially method just-in-time compilers, operate on one function at a time. It is a natural code unit size, especially for a dynamic language JIT: at a given point in time, what more information can you gather about other parts of a running, changing system?
I don’t have any data to back this up—maybe I should go gather some—but on average, methods are small. Especially in languages such as Ruby that use method dispatch for everything, even instance variable (attribute, field, …) lookups, they are small. And everywhere.
This makes the compiler sad. If we are to continue to anthropomorphize them, compilers like having more context so they can optimize better. Consider the following silly-looking example that is actually representative of a surprising amount of real-world code:
class Point
attr_reader :x, :y
def initialize(x, y)
@x = x
@y = y
end
def distance(other)
Math.sqrt((@x - other.x)**2 + (@y - other.y)**2)
end
end
def distance_from_origin(x, y)
Point.new(x, y).distance(Point.new(0, 0))
end
Right now, in the distance_from_origin method, I count 8 different method calls:
Point.newPoint#initializePoint.newPoint#initializePoint#distanceFloat#**Float#**Math.sqrt(Technically more, but the ivar lookups (including attr_reader!), addition,
and subtraction are generally specialized and don’t push a frame, even in the
interpreter.)
Furthermore, there are at least two heap allocations: one for each Point
instance.
Last, there is a bunch of memory traffic to and from Point instances.
This all is a huge bummer! What should be a simple math operation is now
overwhelmed with a bunch of other stuff. Point is certainly not a zero-cost
abstraction.
Even if we had a bunch of other optimizations such as load-store elimination or escape analysis, they would not be able to do much: pretty much everything escapes and is effectful. That is, unless we inline. Inlining is the lever that enables a bunch of other optimization passes to kick in.
I wrote about the design and implementation of Cinder’s inliner (FB link, personal blog link) a couple of years ago. I wrote about arguably the simplest part, which is copying the callee body into the caller. It took me at least a week to get working. Probably closer to months if you consider all the plumbing through the rest of the JIT. In February during a small hackathon, I watched my colleague k0kubun prototype that bit of the inliner inside ZJIT in about 30 minutes.
There is more to do when pretty much every part of the VM is observable from the guest language: both Python and Ruby allow inspecting the state of the locals, the call stack, etc from user code. Sampling profilers also expect some amount of breadcrumbs to work with to inspect the stack. So there’s some more machinery still required to pretend like the callee function was not inlined. I talk about this a little bit in the Cinder blog post.
Even so, all of that can probably be designed and wired together in a couple of months. Then you will find yourself tuning the inliner for the next 10 years. This is much harder.
The thing that makes inlining difficult, especially in a method JIT, is that you are trying to make an entire (dynamic!) system faster but you are only looking through a microscope and only capable of local reasoning1. Whereas other optimizations such as strength reduction, inline caches, and value numbering are an un-alloyed good for the generated code, inlining can have negative effects. It is also perhaps the first optimization people add that has non-local impact.
If you inline wrong, your code size might blow up. This might thrash your CPU’s caches. Bummer, but happens to the best of us.
But also, if you inline wrong, you might get in the way of other helpful optimizations: if you hit some size limit after inlining method A, you might never get to inline B, which is the key to unlocking the performance of the method you are trying to optimize.
Last, inlining might hurt compile time. In situations where latency is paramount (think: interactive client JavaScript), adding tons more code into the fray might add noticeable hiccups, even if the long-term throughput improves. As always, in-band compilation is a trade-off because any time you spend compiling, you are not executing code.
You have to write your compiler to reason about all of this stuff. So you have heuristics. For example, here is Michael Pollan’s inliner heuristic:
Inline methods. Mostly small. Not too many.
I did a survey of a bunch of compilers, mostly JIT compilers, to see what their inlining heuristics look like. I also read (skimmed) some papers to see what those folks had to say. I wonder if they agree.
This post was a long time coming. I started working on it about five years ago but then when I quit working at Facebook I accidentally left behind all of the inliner research I did for Cinder’s inliner. So then I kind of just thought about it aimlessly for a while before redoing it this year. Anyway, here’s wonderwall.
Spoiler alert: all in all, people tend to look at:
And also have different interesting ways to pipe in profile information.
Last, some newer papers do some wild stuff:
Another thing to consider in inlining is how you gather and interpret profiles.
When you compile a function, you tend to specialize it based on the input it has historically been given. For a monomorphic input, maybe you guard that the type is still the same and otherwise jump into the interpreter. For a polymorphic input, maybe you check the top K (~4) common cases and otherwise jump into the interpreter. Fine.
But sometimes you can be compiling a polymorphic method bar that is actually
monomorphic in its caller foo. That is, foo might only ever pass one kind
of input to bar, but other callers pass all kinds of stuff. Here is a bit of
a silly example to show what I mean:
class HashWithIndifferentAccess
def initialize
@hash = {}
end
# Allow reading from the Hash with either a String or a Symbol
def [](key) = @hash[key.to_sym]
# ...
end
# some method...
some_hash = HashWithIndifferentAccess.new
# ...
some_hash["abc"]
# some other method...
another_hash = HashWithIndifferentAccess.new
# ...
another_hash[:xyz]
Just kidding, not so silly at all. It’s a super common pattern in
Rails. It makes key polymorphic in HashWithIndifferentAccess#[] even
though for many of its callers, it may well be monomorphic (or even a
constant).
In order to plumb this information through to the compiler, you have to figure out this call context relationship. There are a couple of common ways to do it.
YJIT, for example, though it does not inline, splits methods based on the types of the arguments going in. This means that it clones the compiled code, generating a new version for each context. This does not give call context (“A calls B”) but gives type context (“B is called with integers, B’ is called with strings”).
A compiler could do type-based splitting in the interpreter or a baseline tier.
If you don’t fancy duplicating the code, you can instead duplicate the profiles. You could either do this using type context (as above) or using call context. SpiderMonkey, for example, does “trial inlining” that allows callers to pass down a bit of memory for potential inline candidate callees to record their inline caches. Instead of each function holding its own ICScript, the caller allocates a unique ICScript for that potential-inline call-site. This gives each callee function (at least?) one level of call context.
Later, when inlining the callee into the caller, we don’t have other callers’ type information polluting the IR builder (or whatever reads the profiles).
JavaScriptCore handles this by inlining bytecode into other bytecode. This is a gnarly transformation but gives the interpreter, even (!) access to call context. On tier-up to the compiler, all the inlining decisions have been made already.
HotSpot handles this with multiple tiers. The interpreter tiers up to the client compiler, C1. C1 profiles branch and call targets in compiled code. C1 may eventually recompile based on this new information. C1 may eventually tier up to C2, which copies C1 inlining decisions. This way, we get call context in profiles via inlining.
One last thing you could do is just trust your type inference and branch folding in the optimizer. You could inline and do polymorphic specialization in the callee when building the IR, then hope that your branch pruning monomorphizes the inlined callee. It’s a little wasteful because the polymorphic code is built “for nothing”, but it might work fine?
Okay, onto the collected notes and half-baked commentary. Here’s a survey of a bunch of JIT compilers and how they reason about inlining heuristics.
But before we get into that, thanks to Iain Ireland, CF Bolz-Tereick, and Ian Rogers for feedback on this blog post!
What follows is mostly a “bits and bobbles” section a la Phil Zucker.
We’ll start with Cinder, because when I wrote Cinder’s inliner I added only the simplest heuristics, mostly “don’t inline” signals. Over time, after I left, people tuned it a bit more.
The inliner starts from the caller CFG, walking it to find suitable inlining candidates. Inlining candidates are only for call targets that are known—in Cinder’s case, only for monomorphic call targets—and pass some checks. The callee is only known by it’s function object, which includes its bytecode. There is no IR available for the callee until we decide to inline.
Most of the “can’t handle this” checks are related to argument handling. Python
has a pretty complex calling convention, so if the caller/callee have not
agreed on how the arguments should be passed through, the inliner doesn’t care
to try and figure it out on its own. That is the responsibility of other parts
of the compiler. Things in this canInline
function could be considered “TODO”.
bool canInline(Function& caller, AbstractCall* call_instr) {
// ...
BorrowedRef<PyFunctionObject> func = call_instr->func;
auto fail = [&](InlineFailureType failure_type) {
dlogAndCollectFailureStats(caller, call_instr, failure_type);
return false;
};
if (func->func_kwdefaults != nullptr) {
return fail(InlineFailureType::kHasKwdefaults);
}
BorrowedRef<PyCodeObject> code{func->func_code};
JIT_CHECK(PyCode_Check(code), "Expected PyCodeObject");
if (code->co_kwonlyargcount > 0) {
return fail(InlineFailureType::kHasKwOnlyArgs);
}
// ...
}
Failures are logged so they can be analyzed. If the Cinder team determines that there is some very frequent case they should handle, they will find out from the logs.
The inliner collects all candidate call instructions in one pass over the CFG. It loads the configurable “cost limit” from the options struct. Then it does one pass over the inlining candidates vector, inlining until it (maybe) hits the cost limit.
// ...
size_t cost_limit = getConfig().inliner_cost_limit;
size_t cost = codeCost(irfunc.code);
// Inline as many calls as possible, starting from the top of the function and
// working down.
for (auto& call : to_inline) {
BorrowedRef<PyCodeObject> call_code{call.func->func_code};
size_t new_cost = cost + codeCost(call_code);
if (new_cost > cost_limit) {
LOG_INLINER(
"Inliner reached cost limit of {} when trying to inline {} into {}, "
"inlining stopping early",
new_cost,
funcFullname(call.func),
irfunc.fullname);
break;
}
cost = new_cost;
inlineFunctionCall(irfunc, &call);
// We need to reflow types after every inline to propagate new type
// information from the callee.
reflowTypes(irfunc);
}
// ...
It does some graph maintenance work after inlining these calls, but that’s it.
This approach gets a surprising amount of utility for being so simple: it
inlines constants (quite a few methods look like def foo(): return 5), small
methods, and (at least, as far as I can remember) shrinks the compiled code
size. All for very little compile time overhead.
There’s one other “standalone” Python JIT out there, PyPy. So we should look at that too.
There are two inliners in PyPy. One is inside the RPython to C translation pipeline, which acts more like an ahead-of-time compiler2. Then there is the tracing JIT bit, which has its own optimizer and heuristics. We’re going to look at the latter.
I talked to CF Bolz-Tereick about the inliner and their comment was that PyPy’s inlining heuristic is “yes”. There are a couple of exceptions, such as not inlining recursive functions or functions with loops. But the basic idea of tracing includes tracing through call instructions, which naturally means that you are “inlining”.
PyPy also does this neat thing where they treat frame pushes like normal allocation. Frame pushes, frame reads, and frame writes get written to the trace like normal object memory traffic and can get optimized away like other field reads and writes. This means that they can “just” use DCE to eliminate frame pushes and pops, whereas Cinder has some complicated mechanism to do it (which is my fault).
TODO get more details here
V8 is a JS engine and it has over the years had many execution approaches. We’ll look at three of them since they all have or had their place in the history:
They also each inline at different times in the pipeline, which made for a fun time trying to understand the different codebases.
Inlining happens during Hydrogen graph building
Don’t store function bytecode of all functions; need to re-parse callee text source to inline
https://docs.google.com/document/d/1VoYBhpDhJC4VlqMXCKvae-8IGuheBGxy32EOgC2LnT8/edit
When optimizing, add call instructions to the inline candidates list: https://github.com/v8/v8/blob/1a391f98cc7a9196369f2d6cab7df35ffbe92c08/src/maglev/maglev-graph-optimizer.cc#L1271
ProcessResult MaglevGraphOptimizer::VisitCall(Call* node,
const ProcessingState& state) {
// ...
int bytecode_length = shared.GetBytecodeArray(broker()).length();
float score =
(call_frequency / bytecode_length) * (loop_depth_ > 0 ? 1.5 : 1.0);
bool is_small_function =
bytecode_length <
reducer_.graph()->compilation_info()->flags().max_eager_inlined_bytecode;
// ...
MaglevCallSiteInfo* call_site = reducer_.zone()->New<MaglevCallSiteInfo>(
MaglevCallerDetails{
...
is_small_function, call_frequency,
...
},
score, bytecode_length);
reducer_.PushInlineCandidate(call_site);
// ...
}
Unlike for example Cinder, Maglev looks like it does not have a lot of restrictions about what can get inlined into what, so its “can inline” signal is about budget. Actually two budgets: small budget and normal budget.
bool MaglevInliner::CanInlineCall() {
// We stop inlining entirely if the small budget is exhausted.
// Inlining decisions after that become bad if we stop inlining small
// functions, but keep inlining large ones.
return !graph_->inlineable_calls().empty() &&
(graph_->total_inlined_bytecode_size() <
max_inlined_bytecode_size_cumulative() ||
graph_->total_inlined_bytecode_size_small() <
max_inlined_bytecode_size_small_total());
}
Then its inlining loop is a greedy walk of the to-inline queue checking candidate sizes.
bool MaglevInliner::InlineCallSites() {
DCHECK(CanInlineCall());
while (!graph_->inlineable_calls().empty()) {
// pop from inlineable_calls
MaglevCallSiteInfo* call_site = ChooseNextCallSite();
bool is_small_with_heapnum_input_outputs =
IsSmallWithHeapNumberInputsOutputs(call_site);
if (graph_->total_inlined_bytecode_size() >
max_inlined_bytecode_size_cumulative()) {
// We ran out of budget. Checking if this is a small-ish function that we
// can still inline.
if (graph_->total_inlined_bytecode_size_small() >
max_inlined_bytecode_size_small_total()) {
graph_->compilation_info()->set_could_not_inline_all_candidates();
break;
}
if (!is_small_with_heapnum_input_outputs) {
graph_->compilation_info()->set_could_not_inline_all_candidates();
// Not that we don't break just rather just continue: next candidates
// might be inlineable.
continue;
}
}
InliningResult result =
BuildInlineFunction(call_site, is_small_with_heapnum_input_outputs);
// ...
}
return true;
}
It runs this loop (which drains the queue) interleaved with the optimizer (which populates the queue).
bool MaglevInliner::Run() {
if (graph_->inlineable_calls().empty()) return true;
while (CanInlineCall()) {
if (!InlineCallSites()) return false;
RunOptimizer();
}
// ...
}
Confusingly, though, the optimizer also calls another function called
CanInlineCall which checks if it legally can inline:
bool MaglevGraphBuilder::ShouldEagerInlineCall( appears unused? / dead
declaration? maybe src/maglev/maglev-graph-builder.cc is just not working on
github search
MaybeReduceResult MaglevGraphBuilder::TryBuildCallKnownJSFunction( also
unused / dead declaration same
JavaScriptCore is funky! Unlike these other compilers that do inlining in their neat little SSA IRs, JSC inlines at the bytecode level4. This is their way of making sure that they get at least one level of call context into their interpreter inline caches, which will eventually give better information to the compiler.
JSC only inlines based on bytecode profile information, and only inlines bytecode??
TODO find better sources for bytecode inlining
SpiderMonkey has another way of getting that call contet without doing bytecode inlining: they add call context to their inline caches. Methods can pass down an ICScript to their callees where the callee writes its inline cache information. Then, when compiling, the callee is more likely to be monomorphized.
Wasm
SpiderMonkey ICScript
https://fitzgen.com/2025/11/19/inliner.html
Plan: run in interpreter; tier up to C1; profile call targets; inline in C1; profile branch counts; tier up to C2, which copies C1 inlining decisions in bytecode parser
HotSpot C2
Not too small
Walk up the call stack to figure out what to compile
Handling the right thing to inline: def foo(a) = a.each {|x| x }
want to compile foo, inline each, inline block, not compile block separately
(probably)
HotSpot C1
https://bernsteinbear.com/assets/img/design-hotspot-client-compiler.pdf
heuristics:
// Additional condition to limit stack usage for non-recursive calls.
if ((callee_recursive_level == 0) &&
(callee->max_stack() + callee->max_locals() - callee->size_of_parameters() > C1InlineStackLimit)) {
INLINE_BAILOUT("callee uses too much stack");
}
TruffleRuby uses weighted compile queue
Graal https://ieeexplore.ieee.org/document/8661171
https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/docs/design/coreclr/jit/inline-size-estimates.md?plain=1#L5 https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/src/coreclr/jit/fginline.cpp
https://github.com/dotnet/runtime/issues/10303
https://github.com/AndyAyersMS/PerformanceExplorer/blob/master/notes/notes-aug-2016.md
DEFINE_FLAG(int,
deoptimization_counter_inlining_threshold,
12,
"How many times we allow deoptimization before we stop inlining.");
DEFINE_FLAG(bool, trace_inlining, false, "Trace inlining");
DEFINE_FLAG(charp, inlining_filter, nullptr, "Inline only in named function");
// Flags for inlining heuristics.
DEFINE_FLAG(int,
inline_getters_setters_smaller_than,
10,
"Always inline getters and setters that have fewer instructions");
DEFINE_FLAG(int,
inlining_depth_threshold,
6,
"Inline function calls up to threshold nesting depth");
DEFINE_FLAG(
int,
inlining_size_threshold,
25,
"Always inline functions that have threshold or fewer instructions");
DEFINE_FLAG(int,
inlining_callee_call_sites_threshold,
1,
"Always inline functions containing threshold or fewer calls.");
DEFINE_FLAG(int,
inlining_callee_size_threshold,
160,
"Do not inline callees larger than threshold");
DEFINE_FLAG(int,
inlining_small_leaf_size_threshold,
50,
"Do not inline leaf callees larger than threshold");
DEFINE_FLAG(int,
inlining_caller_size_threshold,
50000,
"Stop inlining once caller reaches the threshold.");
DEFINE_FLAG(int,
inlining_hotness,
10,
"Inline only hotter calls, in percents (0 .. 100); "
"default 10%: calls above-equal 10% of max-count are inlined.");
DEFINE_FLAG(int,
inlining_recursion_depth_threshold,
1,
"Inline recursive function calls up to threshold recursion depth.");
DEFINE_FLAG(int,
max_inlined_per_depth,
500,
"Max. number of inlined calls per depth");
An adaptive strategy for inline substitution (PDF)
// Inlining heuristics based on Cooper et al. 2008.
InliningDecision ShouldWeInline(const Function& callee,
intptr_t instr_count,
intptr_t call_site_count) {
// Pragma or size heuristics.
if (inliner_->AlwaysInline(callee)) {
return InliningDecision::Yes("AlwaysInline");
} else if (inlined_size_ > FLAG_inlining_caller_size_threshold) {
// Prevent caller methods becoming humongous and thus slow to compile.
return InliningDecision::No("--inlining-caller-size-threshold");
} else if (instr_count > FLAG_inlining_callee_size_threshold) {
// Prevent inlining of callee methods that exceed certain size.
return InliningDecision::No("--inlining-callee-size-threshold");
}
// Inlining depth.
const int callee_inlining_depth = callee.inlining_depth();
if (callee_inlining_depth > 0 &&
((callee_inlining_depth + inlining_depth_) >
FLAG_inlining_depth_threshold)) {
return InliningDecision::No("--inlining-depth-threshold");
}
// Situation instr_count == 0 denotes no counts have been computed yet.
// In that case, we say ok to the early heuristic and come back with the
// late heuristic.
if (instr_count == 0) {
return InliningDecision::Yes("need to count first");
} else if (instr_count <= FLAG_inlining_size_threshold) {
return InliningDecision::Yes("--inlining-size-threshold");
} else if (call_site_count <= FLAG_inlining_callee_call_sites_threshold) {
return InliningDecision::Yes("--inlining-callee-call-sites-threshold");
}
return InliningDecision::No("default");
}
tracelet based
// Refuse if the cost exceeds our thresholds.
// We measure the cost of inlining each callstack and stop when it exceeds a
// certain threshold. (Note that we do not measure the total cost of all the
// inlined calls for a given caller---just the cost of each nested stack.)
cost = costOfInlining(callerSk, callee, regionAndUnit, annotationsPtr);
if (cost <= Cfg::HHIR::AlwaysInlineVasmCostLimit) {
return accept(folly::sformat("cost={} within always-inline limit", cost));
}
if (region.instrSize() > irgs.budgetBCInstrs) {
return refuse(folly::sformat(
"exhausted bytecode budget: budgetBCInstrs={}, regionSize={}",
irgs.budgetBCInstrs, region.instrSize()));
}
auto maxTotalCost = adjustedMaxVasmCost(irgs, region, inlineDepth(irgs));
int maxCost = maxTotalCost;
if (Cfg::HHIR::InliningUseStackedCost) {
maxCost -= irgs.inlineState.cost;
}
const auto baseProfCount = s_baseProfCount.load();
const auto callerProfCount = irgen::curProfCount(irgs);
const auto calleeProfCount = irgen::calleeProfCount(irgs, region);
if (cost > maxCost) {
auto const depth = inlineDepth(irgs);
return refuse(folly::sformat(
"too expensive: cost={} : maxCost={} : "
"baseProfCount={} : callerProfCount={} : calleeProfCount={} : depth={}",
cost, maxCost, baseProfCount, callerProfCount, calleeProfCount, depth));
}
return accept(folly::sformat("small region with return: cost={} : "
"maxTotalCost={} : maxCost={} : baseProfCount={}"
" : callerProfCount={} : calleeProfCount={}",
cost, maxTotalCost, maxCost, baseProfCount,
callerProfCount, calleeProfCount));
// Instruction limit to control memory.
static constexpr size_t kMaximumNumberOfTotalInstructions = 1024;
// Maximum number of instructions for considering a method small,
// which we will always try to inline if the other non-instruction limits
// are not reached.
static constexpr size_t kMaximumNumberOfInstructionsForSmallMethod = 3;
// Limit the number of dex registers that we accumulate while inlining
// to avoid creating large amount of nested environments.
static constexpr size_t kMaximumNumberOfCumulatedDexRegisters = 32;
// Limit recursive call inlining, which do not benefit from too
// much inlining compared to code locality.
static constexpr size_t kMaximumNumberOfRecursiveCalls = 4;
// Limit recursive polymorphic call inlining to prevent code bloat, since it can quickly get out of
// hand in the presence of multiple Wrapper classes. We set this to 0 to disallow polymorphic
// recursive calls at all.
static constexpr size_t kMaximumNumberOfPolymorphicRecursiveCalls = 0;
// Controls the use of inline caches in AOT mode.
static constexpr bool kUseAOTInlineCaches = true;
// Controls the use of inlining try catches.
static constexpr bool kInlineTryCatches = true;
void HInliner::UpdateInliningBudget() {
if (total_number_of_instructions_ >= kMaximumNumberOfTotalInstructions) {
// Always try to inline small methods.
inlining_budget_ = kMaximumNumberOfInstructionsForSmallMethod;
} else {
inlining_budget_ = std::max(
kMaximumNumberOfInstructionsForSmallMethod,
kMaximumNumberOfTotalInstructions - total_number_of_instructions_);
}
}
bool HInliner::IsInliningEncouraged(const HInvoke* invoke_instruction,
ArtMethod* method,
const CodeItemDataAccessor& accessor) const {
if (CountRecursiveCallsOf(method) > kMaximumNumberOfRecursiveCalls) {
LOG_FAIL(stats_, MethodCompilationStat::kNotInlinedRecursiveBudget)
<< "Method "
<< method->PrettyMethod()
<< " is not inlined because it has reached its recursive call budget.";
return false;
}
size_t inline_max_code_units = codegen_->GetCompilerOptions().GetInlineMaxCodeUnits();
if (accessor.InsnsSizeInCodeUnits() > inline_max_code_units) {
LOG_FAIL(stats_, MethodCompilationStat::kNotInlinedCodeItem)
<< "Method " << method->PrettyMethod()
<< " is not inlined because its code item is too big: "
<< accessor.InsnsSizeInCodeUnits()
<< " > "
<< inline_max_code_units;
return false;
}
if (graph_->IsCompilingBaseline() &&
accessor.InsnsSizeInCodeUnits() > CompilerOptions::kBaselineInlineMaxCodeUnits) {
LOG_FAIL_NO_STAT() << "Reached baseline maximum code unit for inlining "
<< method->PrettyMethod();
outermost_graph_->SetUsefulOptimizing();
return false;
}
if (invoke_instruction->GetBlock()->GetLastInstruction()->IsThrow()) {
LOG_FAIL(stats_, MethodCompilationStat::kNotInlinedEndsWithThrow)
<< "Method " << method->PrettyMethod()
<< " is not inlined because its block ends with a throw";
return false;
}
return true;
}
if (outermost_graph_->IsCompilingBaseline() &&
(current->IsInvokeVirtual() || current->IsInvokeInterface()) &&
ProfilingInfoBuilder::IsInlineCacheUseful(current->AsInvoke(), codegen_)) {
uint32_t maximum_inlining_depth_for_baseline =
InlineCache::MaxDexPcEncodingDepth(
outermost_graph_->GetArtMethod(),
codegen_->GetCompilerOptions().GetInlineMaxCodeUnits());
if (depth_ + 1 > maximum_inlining_depth_for_baseline) {
LOG_FAIL_NO_STAT() << "Reached maximum depth for inlining in baseline compilation: "
<< depth_ << " for " << callee_graph->GetArtMethod()->PrettyMethod();
outermost_graph_->SetUsefulOptimizing();
return false;
}
}
Partial inlining
Understanding and Exploiting Optimal Function Inlining (PDF)
machine learning
Automatic construction of inlining heuristics using machine learning
Machine-Learning-Based Optimization Heuristics in Dynamic Compilers (PDF)
Guiding Inlining Decisions Using Post-Inlining Transformations (PDF)
U Can’t Inline This! (PDF)
Towards better inlining decisions using inlining trials
An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers (PDF)
Automatic Tuning of Inlining Heuristics (PDF)
Inlining-Benefit Prediction with Interprocedural Partial Escape Analysis (PDF)
Inlining of Virtual Methods (PDF)
A Study of Type Analysis for Speculative Method Inlining in a JIT Environment (PDF)
A Comparative Study of Static and Profile-Based Heuristics for Inlining (PDF)
clusters from Custom benefit-driven inliner in Falcon JIT (PDF)