## Parameterized Square Root in Verilog

The square root method implemented here is a simplification of Samavi and Sutikno improvements of the non-restoring digit recurrence square root algorithm. For details about this method, refer to Chapter 7 of the inquiry “How do computers do Math?”. \(\)### Simplified Samovi Square Root

The square root of an integer can be calculated using a circuit of Controlled Subtract-Multiplex (`csm`

) blocks. The blocks were introduced as part of the divider implementation. The square root circuit for an 8-bits value is given in shown below.
Each row performs one “attempt subtraction” cycle. For a start, the top row attempts to subtracts binary 01. If the answer would be negative, the most significant bit of the output will be ‘1’. This msb drives the drives the Output Select (\(os\)) inputs that effectively cancels the subtraction if the result would be negative.
`csm`

blocks based on the word length of the radicand (`xWIDTH`

) . Once more, to describe the circuit in Verilog HDL, we need to derive the rules that govern the connections between the blocks.
Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals *difference*\(d\) and

*borrow-out*\(b\). E.g. \(d_{13}\) identifies the difference signal for the block in row

`1`

and column `3`

. Next, we express the input signals as a function of the output signal names \(d\) and \(b\) and do the same for the quotient itself as shown in the table below.
2BD
Based on this table, we can now express the interconnects using Verilog HDL using `?:`

expressions.
generate genvar ii, jj; : 2BD endgenerateThis code along with the constraints file and test bench are available through GitHub.

#### Results

As usual, the propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when each subtraction needs to be cancelled. Post-map Timing Analysis reveals the worst-case propagation delays. The values in the table below, assume that the size of both operands is the same. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.\(N\) | Timing Analysis | Measured | ||
---|---|---|---|---|

slow 85°C | slow 0°C | fast 0°C | actual | |

4-bits | ||||

8-bits | ||||

16-bits | ||||

27-bits | ||||

32-bits |