Implements carry-lookahead adders using circuits of logic gates Written in Verilog HDL for Altera and Xilinx FPGA’s.

## Carry-lookahead adders

This chapter introduces algorithms to reduce the delay when adding numbers. We will look at two carry-lookahead adders using logic gates. \(\)

### Carry-lookahead adder

Adding a *carry-lookahead* circuit can make the carry generation much faster. The individual bit adders no longer calculate outgoing carries, but instead generate propagate and generate signals.

We define a Partial Full Adder (`pfa`

) module with the usual ports *a* and *b* for the summands, carry input *c _{i}* , the sum

*s*, and two new signals propagate

*p*and generate

*g*. The

*propagate*(

*p*) indicates that the bit would not generate the carry bit itself, but will pass through a carry from a lower bit. The

*generate*(

*g*) signal indicates that the bit generates a carry independent of the incoming carry. The functionality of the

`pfa`

module is expression the circuit and Boolean equations shown below.
For bit position *n*, the outgoing carry *c _{n}* is a function of

*p*,

_{n}*g*and the incoming carry

_{n}*c*. Except for bit position

_{i,n}*0*, the incoming carry equals the outgoing carry of the previous

`pfa`

, \(c_{i,n}=c_{o,n-1}\)
$$
\begin{align*}
c_n &= g_n + c_{in_{n}} \cdot p_n \\
&= g_n + c_{n-1} \cdot p_n
\end{align*}
$$
For a 4-bit `cla`

this results in the following equations for the carryout signals:
$$
\begin{align*}
c_0& = g_0 + c_i \cdot p_0 \\
c_1& = g_1 + c_0 \cdot p_1 \\
c_2& = g_2 + c_1 \cdot p_2 \\
c_3& = g_3 + c_2 \cdot p_3 \\
\end{align*}
$$

Substituting the *c _{n-1}*

The outgoing carries *c _{0…3}* no longer depend on each other, thereby eliminating the “ripple effect”. The outgoing carries can now be implemented with only 3 gate delays (1 for p/g generation, 1 for the ANDs and 1 for the final OR assuming gates with 5 inputs).

The circuit below gives an example of a 4-bit carry look-ahead adder.

The complexity of the carry look-ahead increases dramatically with the bit number. Instead of calculating higher bit carries, one may daisy chaining the carry logic as shown for the 12-bit adder below.

An implementation can be found at GitHub

#### Results

The propagation delay *t _{pd}* depends on size

*n*and the value of operands. For a given size

*n*, adding the value

*1*to an operand that contains all zeroes causes the longest propagation delay. The post-map Timing Analysis tool reveals the worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.

\(N\) | Timing Analysis | Measured | ||
---|---|---|---|---|

slow 85°C | slow 0°C | fast 0°C | actual | |

4-bits | 8.1 ns | 7.2 ns | 5.3 ns | |

8-bits | 9.8 ns | 8.7 ns | 6.2 ns | |

16-bits | 10.0 ns | 8.9 ns | 6.2 ns | |

27-bits | 15.2 ns | 13.5 ns | 9.5 ns | |

32-bits | 21.8 ns | 19.7 ns | 13.6 ns | |

42-bits | 24.4 ns | 21.7 ns | 14.9 ns |

### Multi-level carry-lookahead adder

To improve speed for larger word sizes, we can add a second level of carry look ahead. To facilitate this, we extend the `cla`

circuit by adding \(p_{i,j}\) and \(g_{i,j}\) outputs. The propagate signal \(p_{i,j}\) indicates that an incoming carry propagates from bit position \(i\) to \(j\). The generate signal \(g_{i,j}\) indicates that a carry is generated at bit position \(j\), or if a carry out is generated at a lower bit position and propagates to position \(j\).

For a 4-bit block the equations are $$ \begin{align*} p_{0,3} &= p_3 \cdot p_2 \cdot p_1 \cdot p_0 \\ g_{0,3} &= g_3 + p_3 \cdot g_2 + p_3 \cdot p_2 \cdot g_1 + p_3 \cdot p_2 \cdot p_1 \cdot g_0 \\ c_o &= g_{3,0} + p_{3,0} \cdot c_i \end{align*} $$

The circuit for a 16-bit two-level carry-lookahead adder is shown below

An implementation can be found at GitHub

#### Results

Once more, the propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), adding the value *1* to an operand that contains all zeroes causes the longest propagation delay.

Once more, the post-map Timing Analysis predicts the worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano. As usual, the exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.

\(N\) | Timing Analysis | Measured | ||
---|---|---|---|---|

slow 85°C | slow 0°C | fast 0°C | actual | |

4-bits | 8.1 ns | 7.2 ns | 4.3 ns | |

8-bits | 9.3 ns | 8.3 ns | 5.8 ns | |

16-bits | 11.4 ns | 10.2 ns | 7.1 ns | |

27-bits | 15.3 ns | 13.6 ns | 9.6 ns | |

32-bits | 18.6 ns | 16.7 ns | 11.6 ns | |

42-bits | 18.0 ns | 16.1 ns | 11.2 ns |

### Others

Other adder designs are carry-skip, carry-select and prefix adders.