Implements a math square root using a circuit of logic gates. Written in parameterized Verilog HDL for Altera and Xilinx FPGA’s.

## Square root using logic gates

Samavi and Sutikno improvements of the non-restoring digit recurrence square root algorithm. For details about this method, refer to Chapter 7 of the inquiry “How do Computers do Math?“. \(\)

The square root method implemented here is a simplification of### Simplified Samovi Square Root

The square root of an integer can be calculated using a circuit of Controlled Subtract-Multiplex (`csm`

) blocks. The blocks were introduced as part of the divider implementation. The square root circuit for an 8-bits value is given in shown below.

Each row performs one “attempt subtraction” cycle. For a start, the top row attempts to subtracts binary 01. If the answer would be negative, the most significant bit of the output will be ‘1’. This msb drives the drives the Output Select (\(os\)) inputs that effectively cancels the subtraction if the result would be negative.

Similar to the divider, using Verilog HDL we can generate instances of `csm`

blocks based on the word length of the radicand (`xWIDTH`

) . Once more, to describe the circuit in Verilog HDL, we need to derive the rules that govern the connections between the blocks.

Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals *difference* \(d\) and *borrow-out* \(b\). E.g. \(d_{13}\) identifies the difference signal for the block in row `1`

and column `3`

. Next, we express the input signals as a function of the output signal names \(d\) and \(b\) and do the same for the quotient itself as shown in the table below.

2BD

Based on this table, we can now express the interconnects using Verilog HDL using `?:`

expressions.

generate genvar ii, jj; : 2BD endgenerate

The complete Verilog HDL source code along with the test bench and constraints is available at:

#### Results

As usual, the propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when each subtraction needs to be cancelled.

Post-map Timing Analysis reveals the worst-case propagation delays. The values in the table below, assume that the size of both operands is the same. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.

\(N\) | Timing Analysis | Measured | ||
---|---|---|---|---|

slow 85°C | slow 0°C | fast 0°C | actual | |

4-bits | ||||

8-bits | ||||

16-bits | ||||

27-bits | ||||

32-bits |

## Conclusion

Judging from the number of research papers popping up each year, we can deduce that this is a still active field.

Add elementary functions such as sin, cos, tan, exponential, and logarithm.

- Digital Computer Arithmetic Datapath Design Using Verilog HDL, by J. Stine, 2003.
- Computer Arithmetic: Algorithms and Hardware Designs, by Parhami, 1999 and 2009.
- Guide to FPGA Implementation of Arithmetic Functions, by Deschamps, Sutter and Cantó, 2012.
- Arithmetic Circuits, Structures and Multipliers & Pipelining, Introductory Digital Systems laboratory, by MIT, 2004-2008.
- Modern Computer Arithmetic, by Brent and Zimmermann, 2010.
- Arithmetic Circuits and Multipliers, MIT