The ITTAGE indirect branch predictor

blog.nelhage.com

44 points by Bogdanp 11 hours ago

saagarjha 9 hours ago

If the author is around, the final link points to http://localhost:1313/post/cpython-tail-call/#further-weirdn....

Some architectures have/had branch hint instructions.

https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/a...

The impact of a branch miss is a particular pipeline stalls to flush the incorrect prediction. If there were resources available for the other branch to be speculatively executed concurrently and in parallel it might take less wall time.

pbsd 8 hours ago

The Pentium 4 had branch hints in the form of taken/not taken prefixes. They were not found to be useful and basically ignored in every subsequent Intel microarchitecture, until Redwood Cove brought back the branch taken prefix in 2023.
Taniwha 4 hours ago

Branch hint instructions essentially give you the initial value for your BTC entry, after that you want it to learn - in general though if you initially predict backwards branches and don't predict forwards ones it's almost as good.
Very few architectures have conditional indirect branches and they don't get used all that much:
- subroutine return: better predicted with a stack - virtual method dispatch: needs a predictor (for the destination, not the 'taken' - a different thing with multiple destinations chosen by the history than a normal branch destination which typically has a single destination and a history choosing whether taken or not) - dense case statements: similar to virtual method dispatch but maybe with a need for far more destinations
All these cases often involve a memory load prior to the branch, in essence what you are predicting is what is being loaded, and you want to keep feeding the pipe while you wait for the load to complete

nynx 10 hours ago

I must be missing something here. How would this help predict interpreter dispatch? Those won’t be a function of previous branch history or pc, which may very well be independent of the next opcode. They’d be a function of state in memory or registers.

saagarjha 9 hours ago

Interpreters are just like normal programs, but splatted out a bit. In particular, they have branches and loops just like normal programs. The challenge for processors is that these high level constructs are far apart and dispatched through an interpreter loop, which obfuscates them. Being able to reach further back in history lets you recover this kind of information "through" the intervening bits.
dzaima 9 hours ago

If your interpreter is interpreting a program with unpredictable branches, of course no predictor will magically make your interpreter get branches better predicted than an equivalent compiled program will.
The question here is about all other branching the interpreter will do. i.e. even if you have a unpredictable `if (a+b < 0)`, there's still the dispatching to the "load-variable" and "add" and "load-constant" and "less-than" and "do-branch" opcodes, that still will benefit from being predicted, and they could very well if you have it repeated in a loop (despite still having a single unpredictable branch), or potentially even if you just have a common pattern in the language (e.g. comparison opcodes being followed by a branch opcode).
brigade 10 hours ago

In a hot loop, the next opcode can be predicted quite well from the history of previous opcodes executed, especially once have a couple iterations available in your history. And the opcodes executed in an interpreter are generally equivalent to the dispatch branch target.
achierius 10 hours ago

"very well may be" but oftentimes isn't. Branch history does in practice do a very good job of predicting what target you're going to take for an indirect branch.
- nynx 10 hours ago
  
  Sure. I can easily see that often being the case for arbitrary code but not interpreter dispatch loops.

jonstewart 9 hours ago

I learned about computed goto a dozen years ago, tried it out in my interpreter, and got worse performance in that Haswell era than with a trusty switch statement. Branch predictors have made computed goto less compelling for a good long time.

Tail call is a different matter…