# Building Math Hardware

## Parameterized Divider in Verilog The attempt-subtraction divider was introduced in the inquiry How do Computer do Math. This most basic divider consists of interconnected Controlled Subtract-Multiplex (csm) blocks. Each blocks contains a 1-bit Full Subtractor (fs) with the usual inputs a, b and bi and outputs d and bo. The output select signal, os, signal selects between input x and and the difference x-y. \begin{aligned} d^\prime&=x \oplus y \oplus b_i\\ d&=os \cdot x + \overline{os} \cdot d^\prime\\ b_o&=\overline{x} \cdot y + b_i \cdot (\overline{x \oplus y}) \end{aligned}

### Attempt-subtraction Divider

A complete 8:4-bit divider can therefore be implemented by a matrix of csm modules connected on rows and columns as shown in figure below. Each row performs one “attempt subtraction” cycle. Note that the most significant bit is used to drive the Output Select os inputs. (For more details see “Combinational arithmetic“.)
Similar to the multipliers, using Verilog HDL we can generate instances of csm blocks based on the word length of the dividend (xWIDTH) and divisor (yWIDTH). To describe the circuit in Verilog HDL, we need to derive the rules that govern the connections between the blocks. Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals difference ($$d$$) and borrow-out ($$b$$). E.g. $$d_{13}$$ identifies the difference signal for the block in row 1 and column 3. Next, we express the input signals as a function of the output signal names ($$d$$ and $$b$$) and do the same for the quotient itself as shown in the table below.
Based on this table, we can now express the interconnects using Verilog HDL using ?: expressions.
generate genvar ii, jj;
for ( ii = 0; ii < xWIDTH; ii = ii + 1) begin: gen_ii
for ( jj = 0; jj < yWIDTH + 1; jj = jj + 1) begin: gen_jj
math_divider_csm_block csm(
.a ( jj < 1 ? x[xWIDTH-1-ii] : ii &gt; 0 ? d[ii-1][jj-1] : 1'b0 ),
.b ( jj < yWIDTH ? y[jj] : 1'b0 ),
.bi ( jj &gt; 0 ? b[ii][jj-1] : 1'b0 ),
.os ( b[ii][yWIDTH] ),
.d ( d[ii][jj] ),
.bo ( b[ii][jj] ) );
end
end
for ( ii = 0; ii < xWIDTH; ii = ii + 1) begin: gen_p
assign q[xWIDTH-1-ii] = ~b[ii][yWIDTH];
end
for ( jj = 0; jj <= yWIDTH; jj = jj + 1) begin: gen_r
assign r[jj] = d[xWIDTH-1][jj];
end
endgenerate
This code along with the constraints file and test bench are available through GitHub.

#### Results

As usual, the propagation delay $$t_{pd}$$ depends size $$N$$ and the value of operands. For a given size $$N$$, the maximum propagation delay occurs when each subtraction needs to be cancelled. The worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano are found using the post-map Timing Analysis tool. The values in the table below, assume that the size of both operands is the same. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.
Propagation delay in attempt-subtraction divider
$$N$$ Timing Analysis Measured
slow 85°C slow 0°C fast 0°C actual
4-bits 11.2 ns 10.0 ns 6.8 ns
8-bits 37.1 ns 33.2 ns 22.0 ns
16-bits 148 ns 132 ns 84.8 ns
27-bits 408 ns 365 ns 236 ns
32-bits 485 ns 434 ns 279 ns