In simple terms, pipelining enables instruction level parallelism. From fetching an instruction to its actual execution, one can divide this whole process into well-defined stages. So, in a five stage pipeline, we can effectively execute five different instructions in one clock cycle.
- However, each stage can be stalled due to complexity of operations like division, branching takes more time than addition or unavailability of data, or dependency between successive instructions.
- Implementing a pipeline stage adds cost and delay to architecture, but overall CPI (clocks per instruction) decreases, which makes pipelining viable for computer architecture.
- As the pipeline is as stronger as its weakest link, it should be taken care that every stage should be load balanced while designing a pipeline i.e. time required to process should remain almost equal.
- Reading and writing the memory and register file and using the ALU typically constitute the biggest delays in the processor. So we try to choose five pipeline stages so that each stage involves exactly one of these slow steps. The five stages, one step per stage are:
Fetch
- In the Fetch stage, the processor reads the instruction from instruction memory.
Decode
- In the Decode stage, the processor reads the source operands from the register file and decodes the instruction to produce the control signals.
Execute
- In the Execute stage, the processor performs a computation with the ALU.
Memory
- In the Memory stage, the processor reads or writes data memory.
Writeback
- Writeback stage, the processor writes the result to the register file, when applicable.
In the pipelined processor given in the above figure, the length of a pipeline stage is set at 250 ps by the slowest stage, the memory access (in the Fetch or Memory stage). At time 0, the first instruction is fetched from memory. At 250 ps, the first instruction enters the Decode stage, and a second instruction is fetched. At 500 ps, the first instruction executes, the second instruction enters the Decode stage, and a third instruction is fetched. And so forth, until all the instructions complete. The instruction latency is 5 * 250 = 1250 ps. The throughput is 1 instruction per 250 ps (4 billion instructions per second). Because the stages are not perfectly balanced with equal amounts of logic, the latency is slightly longer for the pipelined than for the single-cycle processor. Similarly, the throughput is not quite five times as great for a five-stage pipeline as for the single-cycle processor. Nevertheless, the throughput advantage is substantial.
Advantages of Five Stage Pipeline:
- Will help in executing multiple instructions in parallel.All the stages i.e. all the units in the pipeline will be busy all the time.
- Because each stage has only one-fifth of the entire logic, the clock frequency is almost five times faster.
- Hence, the latency of each instruction is ideally unchanged, but the throughput is ideally five times better.
- Microprocessors execute millions or billions of instructions per second, so throughput is more important than latency.
But the Pipelining process will introduce some overhead, so the throughput will not be quite as high as we might ideally desire, but pipelining nevertheless gives such great advantage for so little cost that all modern high-performance microprocessors are pipelined.
How Hazards Come into picture in a Five Stage Pipeline:
A central challenge in pipelined systems is handling hazards that occur when the results of one instruction are
needed by a subsequent instruction before the former instruction has completed.
Types of Hazards:
Structure Hazards: (because of one memory)
This type of Hazard occurs when there is a conflict for use of a common resource. Because somewhere in the
pipeline there can be a situation where instruction fetch and the data fetch can happen at the same time. So
in that case instruction fetch would have to stall for that cycle, which would cause a pipeline bubble. Hence
the pipelined data paths require separate instruction/data memories.
This type of Hazard occurs when there is a conflict for use of a common resource. Because somewhere in the
pipeline there can be a situation where instruction fetch and the data fetch can happen at the same time. So
in that case instruction fetch would have to stall for that cycle, which would cause a pipeline bubble. Hence
the pipelined data paths require separate instruction/data memories.
Data Hazards:
This type of Hazard occurs when an instruction depends on the result of a previous instruction which is still in the pipeline.
Possible Solutions for Data Hazards:
- Solving Data Hazards with Forwarding:
Some data hazards can be solved by forwarding (also called bypassing) a result from the memory or write back stage to a dependent instruction in the Execute stage.
- Solving Data Hazards with Stalls:
The alternative solution is to stall the pipeline, holding up operation until the data is available.
Control Hazards:
Control hazards occur when the decision of what instruction to fetch has not been made by the time the next instruction must be fetched. Control hazards are solved by predicting which instruction should be fetched and flushing the pipeline if the prediction is later determined to be wrong. Moving the decision as early as possible minimizes the number of instructions that are flushed on a mis-prediction.
Control hazards occur when the decision of what instruction to fetch has not been made by the time the next instruction must be fetched. Control hazards are solved by predicting which instruction should be fetched and flushing the pipeline if the prediction is later determined to be wrong. Moving the decision as early as possible minimizes the number of instructions that are flushed on a mis-prediction.