The attempt-subtraction divider was introduced in the inquiry How do Computer do Math.\(\) This most basic divider consists of interconnected Controlled Subtract-Multiplex (csm) blocks. Each blocks contains a 1-bit Full Subtractor (fs) with the usual inputs a, b and b_{i} and outputs d and b_{o}. The output select signal, os, signal selects between input x and and the difference x-y.
1-bit controlled subtract-multiplex
$$\begin{aligned}
d^\prime&=x \oplus y \oplus b_i\\
d&=os \cdot x + \overline{os} \cdot d^\prime\\
b_o&=\overline{x} \cdot y + b_i \cdot (\overline{x \oplus y})
\end{aligned}$$
Attempt-subtraction Divider
A complete 8:4-bit divider can therefore be implemented by a matrix of csm modules connected on rows and columns as shown in figure below. Each row performs one “attempt subtraction” cycle. Note that the most significant bit is used to drive the Output Select os inputs. (For more details see “Combinational arithmetic“.)
Similar to the multipliers, using Verilog HDL we can generate instances of csm blocks based on the word length of the dividend (xWIDTH) and divisor (yWIDTH). To describe the circuit in Verilog HDL, we need to derive the rules that govern the connections between the blocks.
Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals difference (\(d\)) and borrow-out (\(b\)). E.g. \(d_{13}\) identifies the difference signal for the block in row 1 and column 3. Next, we express the input signals as a function of the output signal names (\(d\) and \(b\)) and do the same for the quotient itself as shown in the table below.
Based on this table, we can now express the interconnects using Verilog HDL using ?: expressions.
generate genvar ii, jj;
for ( ii = 0; ii < xWIDTH; ii = ii + 1) begin: gen_ii
for ( jj = 0; jj < yWIDTH + 1; jj = jj + 1) begin: gen_jj
math_divider_csm_block csm(
.a ( jj < 1 ? x[xWIDTH-1-ii] : ii > 0 ? d[ii-1][jj-1] : 1'b0 ),
.b ( jj < yWIDTH ? y[jj] : 1'b0 ),
.bi ( jj > 0 ? b[ii][jj-1] : 1'b0 ),
.os ( b[ii][yWIDTH] ),
.d ( d[ii][jj] ),
.bo ( b[ii][jj] ) );
end
end
for ( ii = 0; ii < xWIDTH; ii = ii + 1) begin: gen_p
assign q[xWIDTH-1-ii] = ~b[ii][yWIDTH];
end
for ( jj = 0; jj <= yWIDTH; jj = jj + 1) begin: gen_r
assign r[jj] = d[xWIDTH-1][jj];
end
endgenerate
This code along with the constraints file and test bench are available through GitHub.
Results
As usual, the propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when each subtraction needs to be cancelled.
The worst-case propagation delays for the Terasic Altera Cyclone IV DE0-Nano are found using the post-map Timing Analysis tool. The values in the table below, assume that the size of both operands is the same. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.
Propagation delay in attempt-subtraction divider
\(N\)
Timing Analysis
Measured
slow 85°C
slow 0°C
fast 0°C
actual
4-bits
11.2 ns
10.0 ns
6.8 ns
8-bits
37.1 ns
33.2 ns
22.0 ns
16-bits
148 ns
132 ns
84.8 ns
27-bits
408 ns
365 ns
236 ns
32-bits
485 ns
434 ns
279 ns
The next chapter shows an implementation of the square root algorithm introduced in Chapter 7 of the inquiry “How do computers do Math?”.