Branch instructions in the PowerPC Architecture
May 31st, 2005 admin
1 Must-Have Branches and Good-To-Have Branches
Branch instructions show up in a program when sequential execution of instructions in memory is not feasible or possible.
When the program’s execution direction can go multiple directions depending on a condition, it becomes imperative to have some kind of branching. This type of branch is called a conditional branch. A condition must be checked to decide which direction the branch would take.
1.2 Not Feasible category (Good-to-have branches):
When a certain sequence of instructions repeats multiple times, it might be put into a loop. That way the instruction sequence is just stored once in memory and executed by jumping back to the start of the loop upon hitting the end of the loop. It is sometimes possible to avoid looping (and compilers often use a technique called loop unrolling) by just laying the instruction sequence out repeatedly in memory and running through it without any branches. It is not possible to unroll the loop effectively if the number of times the loop is executed is unknown up until the point the execution reaches the loop. In that case the example falls under the not possible category, since it requires a condition evaluation to determine how many times to execute the sequence of instructions. Often a sequence of instructions that is used several times and/or at several places in execution is put into a “subroutine” and “called” when required. This kind of subroutine/function calls and returns from those functions to mainline execution, require branch instructions, since these instructions are stored only one place in memory and not necessarily after the calling instruction. These branches could also be avoided by not having function calls and instead inlining code, which is another technique compilers sometimes use to reduce branches. Unconditional branches are branches where the program does not have to check for any condition and can branch to the target address unequivocally. Unconditional branches are branches that are taken no matter what. The instructions executed upon branching could have been simply moved into the main line execution to avoid the branch. That is not feasible because the memory holding the instructions could have a large number of copies of the same sequence in a lot of places, thereby utilizing a lot more memory than necessary.
2 Architectural resources required to handle branches
Hardware that supports PowerPC architecture provides three registers that are used by the various flavors of branch instructions
2.1 Condition Register
This 32-bit register has 8 fields (or 4 bits each) and basically stores bits that describe the state the system is in. There are instructions provided by the PowerPC architecture that can set/clear/perform operations on bits in this register. The bits then are used by conditional branch instructions to determine whether to branch or not (non-branching path is known as the flow through path). Such usage of the condition register is required only by the conditional branches.
2.2 Link Register
When branching to a subroutine it is important to know how to join back with the main line execution when the subroutine is executed. The Link Register is the 64-bit register provided by the hardware to store the address of the instruction that follows the branch instruction. This is the address where execution joins the main line upon completing the subroutine. Since not all branches intend to join back the main line (or, in other words, they are not subroutine calls) this storing of the next instruction’s address into the Link Register has to be requested by the branch instruction (or, in other words, the software).
2.3 Count Register
This 64-bit register is used to hold a loop count, which can be checked as a condition for deciding the branch direction, and also be decremented by a branch instruction. This is clearly useful when a branch is used to determine if the requisite number of loops of a sequence of instructions have been executed.
The Count Register can also be used for an entirely different purpose – to store a target address, although such usage might be a redundancy of the PowerPC architecture.
3 The Branch Instructions – basic (orthogonal) and extended (redundant)
Sections 3.1, 3.2, 3.3 and 3.4 describe the 4 sets of basic branch instructions. These are all the instructions that are required for branching in PowerPC software. Section 3.5 discusses branch instructions that are special cases of the basic instructions, and were used often enough to be rightfully given their own opcodes. These are extended mnemonics, which are good to have but not imperative, since one of the basic instructions could be used for an identical operation.
3.1 b, ba, bl, bla : Unconditional branches to relatively nearby addresses
The target address has to be relatively nearby, to permit use of this instruction because the address is specified in an immediate field in the instruction opcode itself.
The instructions ending with the ‘a’ indicate that the target address is absolute as specified in the instruction opcode itself.
The instructions which do not end in ‘a’ indicate that the target address is relative to the address of the current instruction. So the target address is the sum of the current instructions address and the immediate value in the opcode.
How near is “nearby”?
24 bits in the opcode specify the address. This is extended by a 0b00 (two 0 bits) to make it an address aligned at 4 bytes, which is how every instruction in the machine must be placed in memory. The 26 bits specify 64MB of space. Therefore “nearby” refers to 32MB on either side of the current instruction (relative addressing) or 0-64MB of the programs address range (absolute addressing).
The instructions ending in ‘l’ indicate that the next sequential instruction’s address must be saved into the link register. This is useful when a branch is made to a subroutine and at the end of the subroutine the program need only indicate its wish to return to the main flow of execution. It does not have to know the return address for it can use the address stored in the link register as the target.
Target address is known for these instructions, and can be calculated way in advance in the pipeline, since it is part of the instruction opcode, and in the extreme case might need to be added to the current instruction address.
3.2 bc, bca, bcl, bcla : Conditional branches to relatively nearby addresses
Just as is the case with unconditional branches, for conditional branches too the instructions ending in a and l, mean absolute addressing and branch with link, respectively.
The difference, compared to the unconditional branches, is that two more fields are required BO and BI. Each is 5 bits. BI indicates which bit in the CR (Condition Register) must be looked at to make the branching decision. BO indicates a couple of things.
Firstly it specifies what rule should be used when looking at the BI bit in CR to make the branching decision. So say the BI indicates bit 2. BO might indicate a rule that says branch if the CR bit 2 is 0. BO bits can be used to indicate a slightly more complicated rule such as decrement the Count Register by 1 and then branch if the Count Register holds a non-zero value and the CR bit 2 is 0. There are a few other combinations of conditions that can be checked using the bits in the BO field. These conditions are various combinations of
- Decrementing or not decrementing the CTR (Count Register)
- Branching if the CTR is zero or non-zero and a CR bit is 0 or 1
- Branching if the CTR is zero or non-zero
- Branching if a CR bit is 0 or 1
Secondly, the BO field also provides bits that can be used by the compiler to place hints about the direction the branch is likely going to take. This can help speculatively fetching instructions if the branch is predicted taken by the BO field and the target address is known.
Target address is known for these instructions since it is part of the instruction opcode, and in the extreme case might need to be added to the current instruction address.
How near is “nearby” in this case?
14 bits in the opcode specify the address. This is extended by a 0b00 (two 0 bits) to make it an address aligned at 4 bytes, which is how every instruction in the machine must be placed in memory. The 16 bits specify 64KB of space. Therefore “nearby” refers to 32KB on either side of the current instruction (relative addressing) or 0-64KB of the programs address range (absolute addressing).
If the BO field has 1’s in certain positions (1z1zz – where bit denoted by z is ignored), it indicates that the branch is always taken. So that means it is unconditional! So the four instructions described in section 3.1 on unconditional branch instructions, are actually redundant to a limited extent. The reason they are not completely redundant is because those instructions have 10 more bits for the target address field than the conditional branch instructions described in this section.
3.3 bclr, bclrl
These instructions represent conditional branches to the address specified in the link register. The bclrl, as indicated by the l at the end of the opcode, puts in the address of the next sequential instruction to the current instruction in the link register or on the link stack, after figuring out the target address form the link register.
The reason unconditional branches to the link register do not require additional PowerPC instructions (like blr or blrl) is because there is a way to specify a condition in the BO field of the instruction, that makes it “always taken”, or in other words, unconditional.
The target address for this instruction depends on the value residing in the Link Register (or Link Stack) at the time this instruction is being executed (and all previous instructions, that write the Link Register have done so). Therefore, due to the dependency on the Link Register, the target address may not be calculated very early in the pipeline, unless there are no older instructions in the pipeline that might write to the Link Register. Even in that case, an access to the Link Register is required to read its value. The instruction does not contain the target address as opposed to the cases in sections 3.1 and 3.2.
In addition to the BO and BI fields described in section 3.2, there is a 2-bit field in the opcode, BH. The BH field provides additional hints the hardware to assist it in fetching the correct target address ahead of time. The hints are of the type “this is a subroutine return”, “this is not a subroutine return so likely the same target address would be used as was used the last time this branch was taken” or “the target address is not predictable”. This is especially useful, when the compiler uses bclr but it is not for a subroutine call. In that case it might be useful to let the hardware know that there is a good chance that the target used the last time this branch was executed would likely be the target used this time too. The hardware can fetch the instructions from that path without having to wait for the Link Register to be ready and read to find the target address, provided the hardware maintains some kind of a cache to remember the previous targets.
3.4 bcctr, bcctrl
These instructions represent conditional branches to the address specified in the count register. The count register is used here to hold a target address, rather than a count. The name “count register”, is inaccurate in this case. The bcctrl, as indicated by the l at the end of the opcode, puts in the address of the next sequential instruction to the current instruction in the link register or on the link stack.
The reason unconditional branches to the count register do not require additional PowerPC instructions (like bctr or bctrl) is because there is a way to specify a condition in the BO field of the instruction, that makes it “always taken”, or in other words, unconditional.
The target address for this instruction depends on the value residing in the Count Register at the time this instruction is being executed (and all previous instructions, that write the Count Register have done so). Therefore, due to this dependency on the Count Register, the target address may not be calculated very early in the pipeline, unless there are no older instructions in the pipeline that might write to the Count Register. Even in that case, an access to the Count Register is required to read its value. The instruction does not contain the target address as opposed to the cases in sections 3.1 and 3.2.
In addition to the BO and BI fields described in section 3.2, there is a 2-bit field in the opcode, BH. The BH field provides additional hints the hardware to assist it in fetching the correct target address ahead of time. The hints are of the type “this is not a subroutine return so likely the same target address would be used as was used the last time this branch was taken” or “the target address is not predictable”. As seen in section 3.3, the BH field has more use as a hint for the branch to link register instructions.
3.5 Extended Mnemonics
Special cases of conditional instructions (sections 3.2, 3.3 and 3.4) have been given their own mnemonics for ease of readability of assembly listings.
3.5.1 blt
blt target is the same as bc 12, 0, target
12 is the BO field, which, looking at the detailed meaning of the bits indicates “branch if CRBI is 1”, which in this case means “branch if CR0 is 1”.
Bit 0 of the CR is set to 1 when the result of a fixed-point computation is negative.
So effectively “bc 12,0,target” means “branch to the target if a fixed point computation yielded a negative result”. This happens often enough that “branch to target if less than zero” was given a new mnemonic “blt target”.
3.5.2 bne
bne cr2, target is the same as bc 4, 10, target
4 is the BO field, which, looking at the detailed meaning of the bits indicates “branch if CRBI is 0″, which in this case means “branch if CR10 is 0″.
Bit 2 of a specific field in the CR is set to 1 when the result of a compare is equal to 0. The “specific field” has to be specified by the compare instruction. Result of a compare is 0 when the values being compared are equal. So if the bit in the CR is set to 0, that means the values being compared are not equal.
So effectively “bc 4,10,target” means “branch to target register when bit 10 of the CR is 0”, which shows up under the special circumstance meaning “branch to the target if a compare operation that writes to CR field 2 evaluated the operands to be not equal”. This happens often enough that “branch to target if less than zero” was given a new mnemonic “blt cri, target”. i is whatever CR field the compare used.
3.5.3 bdnz
bdnz target is the same as bc 16, 0, target
BO is 16, which indicates, decrement CTR and branch is CTR is non-zero. There is no dependence on the CR!
“bc 16,0,target” therefore is “branch after decrement if non zero”, which has been given the special mnemonic bdnz.
3.5.4 And a few more extended mnemonics …
There are a few other extended mnemonics that extend from the base instruction in the same way as the instructions in sections 3.5.1, 3.5.2 and 3.5.3
Note – PowerPC Architecture, POWER Architecture and PowerPC are registered trademarks of International Business Machines Corporation. Other trademarks used are owned by the respective owners.
Posted in Tutorials | No Comments »
I read this play in a single sitting. The play is written in a style that helps you draw the imagery, the sounds, sights and smells intended by the author. Arthur Miller, who passed away this year, wrote this Pulitzer Prize winning play in 1949. It was converted to a movie in 1952. It helped my understanding of the “American Dream”, a phrase I have heard mentioned many times in the USA. It added a new perspective to my understanding of this phrase. “American Dream” to me brought visions of owning a house, a car, raising a family and making money. Death of a Salesman added to that image a new shade. The cost. At what cost do you live the American Dream? What moral and social pressures do you have to overcome to live this dream? What is this dream worth? Is this the dream one should go after? What if your understanding of the dream is wrong? What if this realization comes too late in life?Death of a Salesman is a poignant story about growing turmoil in an aging man’s soul. His confidence in his values and benevolent trust in the society’s morality are eroded gradually by reality. He never is able to find a clear direction or a satisfying reason for the way his and his family’s life turned out. Searching forever to find the point at which his “dream” betrayed him, the Salesman, tries to find his peace by making the biggest deal of his life. He kills himself in the hope that what he could not achieve in life, could be achieved by his death, in the form of insurance money that his family could use and remember him by.