# Building Math Hardware

## Faster Parameterized Adder in Verilog

This chapter introduces algorithms to reduce the delay when adding numbers. We will look at two carry-lookahead circuits 

Adding a carry-lookahead circuit can make the carry generation much faster. The individual bit adders no longer calculate outgoing carries, but instead generate propagate and generate signals. We define a Partial Full Adder (pfa) module with the usual ports a and b for the summands, carry input ci , the sum s, and two new signals propagate p and generate g. The propagate (p) indicates that the bit would not generate the carry bit itself, but will pass through a carry from a lower bit. The generate (g) signal indicates that the bit generates a carry independent of the incoming carry. The functionality of the pfa module is expression the circuit and Boolean equations shown below.
 \begin{aligned} g &=a \cdot b\\ p &=a \oplus b\\ s &=p \oplus c_i \end{aligned}
For bit position n, the outgoing carry cn is a function of pn, gn and the incoming carry ci,n. Except for bit position 0, the incoming carry equals the outgoing carry of the previous pfa, $$c_{i,n}=c_{o,n-1}$$ \begin{aligned} c_n &=g_n + c_{in_{n}} \cdot p_n \\ &=g_n + c_{n-1} \cdot p_n \end{aligned} For a 4-bit cla this results in the following equations for the carryout signals: \begin{aligned} c_0& = g_0 + c_i \cdot p_0 \\ c_1& = g_1 + c_0 \cdot p_1 \\ c_2& = g_2 + c_1 \cdot p_2 \\ c_3& = g_3 + c_2 \cdot p_3 \\ \end{aligned} Substituting the cn-1

\begin{aligned} c_0 &= g_0 + c_{i} \cdot p_0\\ c_1 &= g_1 + (g_0 + c_{i} \cdot p_0) \cdot p_1\\ &= g_1 + g_0 \cdot p_1 + c_{i} \cdot p_0 \cdot p_1\\ c_2 &= g_2 + (g_1 + g_0 \cdot p_1 + c_{i} \cdot p_0 \cdot p_1) \cdot p_2\\ &= g_2 + g_1 \cdot p_2 + g_0 \cdot p_1 \cdot p_2 + c_{i} \cdot p_0 \cdot p_1 \cdot p_2 \\ c_3 &= g_3 + (g_2 + g_1 \cdot p_2 + g_0 \cdot p_1 \cdot p_2 + c_{i} \cdot p_0 \cdot p_1 \cdot p_2) \cdot p_3\\ &= g_3 + g_2 \cdot p_3 + g_1 \cdot p_2 \cdot p_3 + g_0 \cdot p_1 \cdot p_2 \cdot p_3 +c_{i} \cdot p_0 \cdot p_1 \cdot p_2 \cdot p_3 \end{aligned}

The outgoing carries c0…3 no longer depend on each other, thereby eliminating the “ripple effect”. The outgoing carries can now be implemented with only 3 gate delays (1 for p/g generation, 1 for the ANDs and 1 for the final OR assuming gates with 5 inputs). The circuit below gives an example of a 4-bit carry look-ahead adder.
The complexity of the carry look-ahead increases dramatically with the bit number. Instead of calculating higher bit carries, one may daisy chaining the carry logic as shown for the 12-bit adder below.
The source code is available through GitHub.

#### Results

The propagation delay tpd depends on size n and the value of operands. For a given size n, adding the value 1 to an operand that contains all zeroes causes the longest propagation delay. The post-map Timing Analysis tool reveals the worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.
$$N$$ Timing Analysis Measured
slow 85°C slow 0°C fast 0°C actual
4-bits 8.1 ns 7.2 ns 5.3 ns
8-bits 9.8 ns 8.7 ns 6.2 ns
16-bits 10.0 ns 8.9 ns 6.2 ns
27-bits 15.2 ns 13.5 ns 9.5 ns
32-bits 21.8 ns 19.7 ns 13.6 ns
42-bits 24.4 ns 21.7 ns 14.9 ns

To improve speed for larger word sizes, we can add a second level of carry look ahead. To facilitate this, we extend the cla circuit by adding $$p_{i,j}$$ and $$g_{i,j}$$ outputs. The propagate signal $$p_{i,j}$$ indicates that an incoming carry propagates from bit position $$i$$ to $$j$$. The generate signal $$g_{i,j}$$ indicates that a carry is generated at bit position $$j$$, or if a carry out is generated at a lower bit position and propagates to position $$j$$. For a 4-bit block the equations are \begin{aligned} p_{0,3} &= p_3 \cdot p_2 \cdot p_1 \cdot p_0\\ g_{0,3} &= g_3 + p_3 \cdot g_2 + p_3 \cdot p_2 \cdot g_1 + p_3 \cdot p_2 \cdot p_1 \cdot g_0\\ c_o &= g_{3,0} + p_{3,0} \cdot c_i \end{aligned} The circuit for a 16-bit two-level carry-lookahead adder is shown below
The source code is available through GitHub.

#### Results

Once more, the propagation delay $$t_{pd}$$ depends size $$N$$ and the value of operands. For a given size $$N$$, adding the value (1\) to an operand that contains all zeroes causes the longest propagation delay. Once more, the post-map Timing Analysis predicts the worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano. As usual, the exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.
$$N$$ Timing Analysis Measured
slow 85°C slow 0°C fast 0°C actual
4-bits 8.1 ns 7.2 ns 4.3 ns
8-bits 9.3 ns 8.3 ns 5.8 ns
16-bits 11.4 ns 10.2 ns 7.1 ns
27-bits 15.3 ns 13.6 ns 9.6 ns
32-bits 18.6 ns 16.7 ns 11.6 ns
42-bits 18.0 ns 16.1 ns 11.2 ns

### Others

Other adder designs are carry-skip, carry-select and prefix adders. : The next chapter shows an implementation of a multiplier introduced in Chapter 7 of the inquiry “How do computers do Math?”.
Student at MVHS
I see education as the foundation upon which entrepreneurs are able to build innovative organizations and execute their vision for the future.

This site uses Akismet to reduce spam. Learn how your comment data is processed.