Implements a math multiplier using circuits of logic gates. Written in parameterized Verilog HDL for Altera and Xilinx FPGA’s.

## Multiplier using logic gates

*carry-propagate array multiplier* in the inquiry “How do Computers do Math?“.\(\)

This multiplier is build around Multiplier Adder (`ma`

) blocks. These `ma`

blocks are themselves build around the Full Adder (`fa`

) blocks introduced in the adder section. These `fa`

blocks have the usual inputs \(a\) and \(b\), \(c_i\) and outputs \(s\) and \(c_o\). The special thing is that the internal signal \(b\) is an AND function of the inputs \(x\) and \(y\) as depicted below.

**Carry-propagate Array Multiplier**

As shown in the inquiry “How do Computers do Math?“, a carry-propagate array multiplier can be built by combining many of these `ma`

blocks. The circuit diagram below shows the connections between these blocks for a 4-bit multiplier.

For an implementation in Verilog HDL, we can instantiate `ma`

blocks based on the word length of the multiplicand and multiplier (\(N\)). If you are new to Verilog HDL, remember that the `generate`

code segment expands during compilation time. In other words, it is just a short hand for writing out the long list of `ma`

block instances.

generate genvar ii, jj; for ( ii = 0; ii <; N; ii = ii + 1) begin: gen_ii for ( jj = 0; jj <; N; jj = jj + 1) begin: gen_jj math_multiplier_ma_block ma( .x(?), .y(?), .si(?), .ci(?), .so(?), .co(?) ); end end endgenerate[/code]As you might notice, the input and output ports are not described. For this, we need to derive the rules that govern these interconnects. Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals

sum(\(s\)) andcarry-out(\(c\)). E.g. \(c_{13}\) identifies the carry-out signal for the block in row`1`

and column`3`

. Note that the circuit description depicts the matrix in a slanted fashion.Knowing this, we can enter the output signals in the Verilog HDL code math_multiplier_ma_block ma( .x(?), .y(?), .si(?), .ci(?), .so ( s[ii][jj] ), .co ( c[ii][jj] ) );

Next, we express the input signals as a function of the output signal names \(s\) and \(c\) as shown in the table below.

Based on this table, we can express the input assignments for each `ma`

using "`c ? a : b`

" expressions. Note that Verilog 2001 does not allow these programming statements for the output pins. This is why we expressed the input ports as a function of the output ports instead of visa versa.

math_multiplier_ma_block ma( .x ( a[jj] ), .y ( b[ii]), .si ( ii == 0 ? 1'b0 : jj <; N - 1 ? s[ii-1][jj+1] : c[ii-1][N-1] ), .ci ( jj > 0 ? c[ii][jj-1] : 1'b0 ), .so ( s[ii][jj] ), .co ( c[ii][jj] ) );

All that is left to do is to express the inputs of the module as a function of the output signals

Putting it all together, we get the following snippet generate genvar ii, jj; for (ii = 0; ii <; N; ii = ii + 1) begin: gen_ii for (jj = 0; jj <; N; jj = jj + 1) begin: gen_jj math_multiplier_ma_block ma( .x ( a[jj] ), .y ( b[ii]), .si ( ii == 0 ? 1'b0 : jj <; N - 1 ? s[ii-1][jj+1] : c[ii-1][N-1] ), .ci ( jj > 0 ? c[ii][jj-1] : 1'b0 ), .so ( s[ii][jj] ), .co ( c[ii][jj] ) ); end assign p[ii] = s[ii][0]; end for (jj = 1; jj <; N; jj = jj + 1) begin: gen_jj2 assign p[jj+N-1] = s[N-1][jj]; end assign p[N*2-1] = c[N-1][N-1]; endgenerate[/code]

The `ma`

block compiles into the RTL netlist shown below

As shown in the figure below, the `for`

loops unroll into 16 interconnected `ma`

blocks.

The complete Verilog HDL source code is available at:

### Results

The propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when the low order bit because a carry/sum that propagate to the highest order bit. This worst-case propagation delay is linear with \(3N\). Note that the average propagation delay is about half of this.

The worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano are found using the post-map Timing Analysis tool. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.

\(N\) | Timing Analysis | Measured | ||
---|---|---|---|---|

slow 85°C | slow 0°C | fast 0°C | actual | |

4-bits | 9.9 ns | 8.9 ns | 6.1 ns | |

8-bits | 20.8 ns | 18.6 ns | 12.4 ns | |

16-bits | 41.3 ns | 36.9 ns | 24.2 ns | |

27-bits | 69.6 ns | 62.1 ns | 40.9 ns | 55 ns |

32-bits | 83.4 ns | 74.5 ns | 49.0 ns |

The timing analysis for \(N=27\), reveals that the worst-case propagation delay path goes through \(c_0\) and \(s_o\) as shown below on the left. When measuring the worst-case propagation delay on the actual device, we use input values that cause the maximum number ripple carries and sums propagating. For a 27-bit multiplier that where the input also has a maximum value of 99,999,999, the propagation path is simulated in a spreadsheet as shown below on the right.

Brute force using the FPGA to find all combinations of operands that cause long propagation delays revealed 27'h2FA3A92 * 27h'55D4A77, 27'h60A308B * 27'd99999999 (50ns), 27'h775A668 * 27'd89999999 (55 ns), 27'h56F5D8F * 27'h3AAAB7B (55 ns).