Multiplier circuit

Implements a math multiplier using circuits of logic gates. Written in parameterized Verilog HDL for Altera and Xilinx FPGA’s.

Multiplier using logic gates

multiplier icon We introduced the carry-propagate array multiplier in the inquiry “How do Computers do Math?“.\(\)

This multiplier is build around Multiplier Adder (ma) blocks. These ma blocks are themselves build around the Full Adder (fa) blocks introduced in the adder section. These fa blocks have the usual inputs \(a\) and \(b\), \(c_i\) and outputs \(s\) and \(c_o\). The special thing is that the internal signal \(b\) is an AND function of the inputs \(x\) and \(y\) as depicted below.

$$ \begin{align*} a &= s_i\\ b &= x \cdot y\\ s_o &= s_i\oplus b\oplus c_i\\ c_o &= s_i \cdot b + c_i \cdot(s_i \oplus b) \end{align*} $$
1-bit multiplying-adder

Carry-propagate Array Multiplier

As shown in the inquiry “How do Computers do Math?“, a carry-propagate array multiplier can be built by combining many of these ma blocks. The circuit diagram below shows the connections between these blocks for a 4-bit multiplier.

4-bit carry-propagate array multiplier

For an implementation in Verilog HDL, we can instantiate ma blocks based on the word length of the multiplicand and multiplier (\(N\)). If you are new to Verilog HDL, remember that the generate code segment expands during compilation time. In other words, it is just a short hand for writing out the long list of ma block instances.

generate genvar ii, jj;
    for ( ii = 0; ii <; N; ii = ii + 1) begin: gen_ii
        for ( jj = 0; jj <; N; jj = jj + 1) begin: gen_jj
            math_multiplier_ma_block ma( 
                .x(?), .y(?), .si(?), .ci(?),
                .so(?), .co(?) );
        end
    end
endgenerate[/code]

As you might notice, the input and output ports are not described. For this, we need to derive the rules that govern these interconnects. Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals sum (\(s\)) and carry-out (\(c\)). E.g. \(c_{13}\) identifies the carry-out signal for the block in row 1 and column 3. Note that the circuit description depicts the matrix in a slanted fashion.

own work
Output signals 'so' and 'co'

Knowing this, we can enter the output signals in the Verilog HDL code math_multiplier_ma_block ma( .x(?), .y(?), .si(?), .ci(?), .so ( s[ii][jj] ), .co ( c[ii][jj] ) );

Next, we express the input signals as a function of the output signal names \(s\) and \(c\) as shown in the table below.

own work
Input signals 'x', 'y', 'si' and 'ci'

Based on this table, we can express the input assignments for each ma using "c ? a : b" expressions. Note that Verilog 2001 does not allow these programming statements for the output pins. This is why we expressed the input ports as a function of the output ports instead of visa versa.

math_multiplier_ma_block ma( 
    .x ( a[jj] ),
    .y ( b[ii]),
    .si ( ii == 0 ? 1'b0 : jj <; N - 1 ? s[ii-1][jj+1] : c[ii-1][N-1] ),
    .ci ( jj > 0 ? c[ii][jj-1] : 1'b0 ),
    .so ( s[ii][jj] ),
    .co ( c[ii][jj] ) );

All that is left to do is to express the inputs of the module as a function of the output signals

Output signal 'p'

Putting it all together, we get the following snippet generate genvar ii, jj; for (ii = 0; ii <; N; ii = ii + 1) begin: gen_ii for (jj = 0; jj <; N; jj = jj + 1) begin: gen_jj math_multiplier_ma_block ma( .x ( a[jj] ), .y ( b[ii]), .si ( ii == 0 ? 1'b0 : jj <; N - 1 ? s[ii-1][jj+1] : c[ii-1][N-1] ), .ci ( jj > 0 ? c[ii][jj-1] : 1'b0 ), .so ( s[ii][jj] ), .co ( c[ii][jj] ) ); end assign p[ii] = s[ii][0]; end for (jj = 1; jj <; N; jj = jj + 1) begin: gen_jj2 assign p[jj+N-1] = s[N-1][jj]; end assign p[N*2-1] = c[N-1][N-1]; endgenerate[/code]

The ma block compiles into the RTL netlist shown below

own work
1-bit multiplying-adder in RTL

As shown in the figure below, the for loops unroll into 16 interconnected ma blocks.

own work
4-bit carry-propagate array multiplier in RTL

The complete Verilog HDL source code is available at:

Results

The propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when the low order bit because a carry/sum that propagate to the highest order bit. This worst-case propagation delay is linear with \(3N\). Note that the average propagation delay is about half of this.

The worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano are found using the post-map Timing Analysis tool. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.

\(N\) Timing Analysis Measured
slow 85°C slow 0°C fast 0°C actual
4-bits 9.9 ns 8.9 ns 6.1 ns
8-bits 20.8 ns 18.6 ns 12.4 ns
16-bits 41.3 ns 36.9 ns 24.2 ns
27-bits 69.6 ns 62.1 ns 40.9 ns 55 ns
32-bits 83.4 ns 74.5 ns 49.0 ns
Propagation delay in carry-propagate array multiplier

The timing analysis for \(N=27\), reveals that the worst-case propagation delay path goes through \(c_0\) and \(s_o\) as shown below on the left. When measuring the worst-case propagation delay on the actual device, we use input values that cause the maximum number ripple carries and sums propagating. For a 27-bit multiplier that where the input also has a maximum value of 99,999,999, the propagation path is simulated in a spreadsheet as shown below on the right.

Worst case path
Worst case input

Brute force using the FPGA to find all combinations of operands that cause long propagation delays revealed 27'h2FA3A92 * 27h'55D4A77, 27'h60A308B * 27'd99999999 (50ns), 27'h775A668 * 27'd89999999 (55 ns), 27'h56F5D8F * 27'h3AAAB7B (55 ns).

Following this "Math multiplier using logic gates", the next chapter explores methods of making the multiplication operation faster.