2 unstable releases

0.2.0 Oct 23, 2024
0.1.0 Oct 20, 2024

#65 in Profiling

Download history 167/week @ 2024-10-14 539/week @ 2024-10-21 152/week @ 2024-10-28 174/week @ 2024-11-04 44/week @ 2024-11-11

1,076 downloads per month
Used in binggan

MIT license

1MB
57K SLoC

BPU Trasher

BPU (Branch Predictor Unit) Trasher tries to trash the BPU of a processor by trying different strategies for the different components of the BPU.

This tool is useful to avoid overtraining the branch predictor unit of a processor an getting skewed results in benchmarks.

BPU Concepts

The exact components of the BPU can vary from processor to processor and are not public information.

Types of Branches

Although intuitively, we think of branches as conditional (e.g. if), for the CPU anything that changes the flow of the program is a branch. Therefore, there are different types of branches:

  1. Unconditional branches Instructions: JMP, CALL

  2. Conditional branches Instructions:JNE, JZ, JE

  3. Indirect Branch Instructions: JMP [RAX] Like Conditional branches, but the target address is not known at compile time, e.g. a virtual function call.

  4. Function Call Instructions: CALL

  5. Return from Function This is an unconditional branch. It's its own type, since the BPU has a special handling for it. Instructions: RET

2-Level Adaptive Predictor

The 2-level adaptive predictor is a common branch predictor used in modern processors.

The 2-level adaptive predictor consists of two tables:

  1. The Branch History Table (BHT)
  2. The Pattern History Table (PHT)

The BHT is a table that stores the history of the last N branches. The PHT is a table that stores the prediction of the last N branches. The BHT is used to index the PHT. The PHT is used to predict the outcome of the branch.

Branch target buffer (BTB)

The BTB is a cache that stores the target address of the most the frequently used conditional and unconditional branches.

Depending on the CPU, there may be different branch target buffers for different types of branches. For example, there may be a separate BTB for indirect branches.

Loop/Switch Counter

The loop counter is a special case of a branch predictor that is used to predict the number of iterations of a loop. An entry in the BTB will contain information if the branch has loop behaviour and if yes, the number of iterations of the loop.

Indirect Jump Prediction

Indirect jumps are jumps that are not directly to a specific address but to an address that is calculated at runtime. Due to polymorphism, the BTB may need to keep multiple targets.

I think in Rust this would be as simple as:

fn call_me(obj: &dyn MyTrait) {
    obj.call_me(); // could have multiple targets
}

Hashing

In order to reliably trash the BPU, we need to understand how the hashing function works.

" The literature recommends that the index into the pattern history table is generated by an XOR combination of the history bits and the branch address. However, my experimental results do not confirm such a design. The indexing function in figure 3.3 may be a more complex hashing function of the history and the branch address, or it may involve the branch target address, BTB entry address or trace cache address.

Since the indexing function is not known, it is impossible to predict whether two branches will use the same entry in the pattern history table. For the same reason, I have not been able to measure the size of the pattern history table, but must rely on rumors in the literature. "

Perceptron

A perceptron is a type of neural network that is used in the BPU to predict the outcome of a branch. It is used in Zen 4 processors.

Return address stack (RAS)

Sources

This is an magnificient ressource: https://www.agner.org/optimize/microarchitecture.pdf

Deep dive into an older architecture: http://www.ece.uah.edu/~milenka/docs/VladimirUzelac.thesis.pdf

https://xania.org/201602/bpu-part-one

https://www.cs.utexas.edu/~lin/papers/hpca01.pdf

Dependencies

~315KB