Implements a math multiplier using circuits of logic gates. Written in parameterized Verilog HDL for Altera and Xilinx FPGA’s.
Multiplier using logic gates
How do Computers do Math?“.\(\)
We introduced the carry-propagate array multiplier in the inquiry “
This multiplier is build around Multiplier Adder (ma
) blocks. These ma
blocks are themselves build around the Full Adder (fa
) blocks introduced in the adder section. These fa
blocks have the usual inputs \(a\) and \(b\), \(c_i\) and outputs \(s\) and \(c_o\). The special thing is that the internal signal \(b\) is an AND function of the inputs \(x\) and \(y\) as depicted below.
Carry-propagate Array Multiplier
As shown in the inquiry “How do Computers do Math?“, a carry-propagate array multiplier can be built by combining many of these ma
blocks. The circuit diagram below shows the connections between these blocks for a 4-bit multiplier.
For an implementation in Verilog HDL, we can instantiate ma
blocks based on the word length of the multiplicand and multiplier (\(N\)). If you are new to Verilog HDL, remember that the generate
code segment expands during compilation time. In other words, it is just a short hand for writing out the long list of ma
block instances.
generate genvar ii, jj; for ( ii = 0; ii <; N; ii = ii + 1) begin: gen_ii for ( jj = 0; jj <; N; jj = jj + 1) begin: gen_jj math_multiplier_ma_block ma( .x(?), .y(?), .si(?), .ci(?), .so(?), .co(?) ); end end endgenerate[/code] </p> <p> As you might notice, the input and output ports are not described. For this, we need to derive the rules that govern these interconnects. Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals <em>sum</em> (\(s\)) and <em>carry-out</em> (\(c\)). E.g. \(c_{13}\) identifies the carry-out signal for the block in row <code>1</code> and column <code>3</code>. Note that the circuit description depicts the matrix in a slanted fashion. <div class="flex-container"> <figure> <a class="hide-anchor fancybox" href="/wp-content/uploads/math-multiplier-ripple-carry-tbl-ma-output.png"> <img class="wp-image-16504" src="https://coertvonk.com/wp-content/uploads/math-multiplier-ripple-carry-tbl-ma-output.png" alt="own work" width="420" /> </a> <figcaption> Output signals 'so' and 'co' </figcaption> </figure> </div> </p> <p> Knowing this, we can enter the output signals in the Verilog HDL code math_multiplier_ma_block ma( .x(?), .y(?), .si(?), .ci(?), .so ( s[ii][jj] ), .co ( c[ii][jj] ) );
Next, we express the input signals as a function of the output signal names \(s\) and \(c\) as shown in the table below.
Based on this table, we can express the input assignments for each ma
using "c ? a : b
" expressions. Note that Verilog 2001 does not allow these programming statements for the output pins. This is why we expressed the input ports as a function of the output ports instead of visa versa.
math_multiplier_ma_block ma( .x ( a[jj] ), .y ( b[ii]), .si ( ii == 0 ? 1'b0 : jj <; N - 1 ? s[ii-1][jj+1] : c[ii-1][N-1] ), .ci ( jj > 0 ? c[ii][jj-1] : 1'b0 ), .so ( s[ii][jj] ), .co ( c[ii][jj] ) );
All that is left to do is to express the inputs of the module as a function of the output signals
Putting it all together, we get the following snippet generate genvar ii, jj; for (ii = 0; ii <; N; ii = ii + 1) begin: gen_ii for (jj = 0; jj <; N; jj = jj + 1) begin: gen_jj math_multiplier_ma_block ma( .x ( a[jj] ), .y ( b[ii]), .si ( ii == 0 ? 1'b0 : jj <; N - 1 ? s[ii-1][jj+1] : c[ii-1][N-1] ), .ci ( jj > 0 ? c[ii][jj-1] : 1'b0 ), .so ( s[ii][jj] ), .co ( c[ii][jj] ) ); end assign p[ii] = s[ii][0]; end for (jj = 1; jj <; N; jj = jj + 1) begin: gen_jj2 assign p[jj+N-1] = s[N-1][jj]; end assign p[N*2-1] = c[N-1][N-1]; endgenerate[/code]
The ma
block compiles into the RTL netlist shown below
As shown in the figure below, the for
loops unroll into 16 interconnected ma
blocks.
The complete Verilog HDL source code is available at:
Results
The propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when the low order bit because a carry/sum that propagate to the highest order bit. This worst-case propagation delay is linear with \(3N\). Note that the average propagation delay is about half of this.
The worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano are found using the post-map Timing Analysis tool. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.
\(N\) | Timing Analysis | Measured | ||
---|---|---|---|---|
slow 85°C | slow 0°C | fast 0°C | actual | |
4-bits | 9.9 ns | 8.9 ns | 6.1 ns | |
8-bits | 20.8 ns | 18.6 ns | 12.4 ns | |
16-bits | 41.3 ns | 36.9 ns | 24.2 ns | |
27-bits | 69.6 ns | 62.1 ns | 40.9 ns | 55 ns |
32-bits | 83.4 ns | 74.5 ns | 49.0 ns |
The timing analysis for \(N=27\), reveals that the worst-case propagation delay path goes through \(c_0\) and \(s_o\) as shown below on the left. When measuring the worst-case propagation delay on the actual device, we use input values that cause the maximum number ripple carries and sums propagating. For a 27-bit multiplier that where the input also has a maximum value of 99,999,999, the propagation path is simulated in a spreadsheet as shown below on the right.
Brute force using the FPGA to find all combinations of operands that cause long propagation delays revealed 27'h2FA3A92 * 27h'55D4A77, 27'h60A308B * 27'd99999999 (50ns), 27'h775A668 * 27'd89999999 (55 ns), 27'h56F5D8F * 27'h3AAAB7B (55 ns).