Building Math Hardware

(c) Copyright 2016 Coert Vonk

Parameterized Square Root in Verilog

The square root method implemented here is a simplification of Samavi and Sutikno improvements of the non-restoring digit recurrence square root algorithm. For details about this method, refer to Chapter 7 of the inquiry “How do computers do Math?”. \(\)

Simplified Samovi Square Root

The square root of an integer can be calculated using a circuit of Controlled Subtract-Multiplex (csm) blocks. The blocks were introduced as part of the divider implementation. The square root circuit for an 8-bits value is given in shown below. Each row performs one “attempt subtraction” cycle. For a start, the top row attempts to subtracts binary 01. If the answer would be negative, the most significant bit of the output will be ‘1’. This msb drives the drives the Output Select (\(os\)) inputs that effectively cancels the subtraction if the result would be negative.
8-bit simplified-Samovi square-root
Similar to the divider, using Verilog HDL we can generate instances of csm blocks based on the word length of the radicand (xWIDTH) . Once more, to describe the circuit in Verilog HDL, we need to derive the rules that govern the connections between the blocks. Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals difference \(d\) and borrow-out \(b\). E.g. \(d_{13}\) identifies the difference signal for the block in row 1 and column 3. Next, we express the input signals as a function of the output signal names \(d\) and \(b\) and do the same for the quotient itself as shown in the table below. 2BD Based on this table, we can now express the interconnects using Verilog HDL using ?: expressions.
generate genvar ii, jj;
: 2BD
This code along with the constraints file and test bench are available through GitHub.


As usual, the propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when each subtraction needs to be cancelled. Post-map Timing Analysis reveals the worst-case propagation delays. The values in the table below, assume that the size of both operands is the same. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.
Propagation delay for simplified-Samovi square-root
\(N\) Timing Analysis Measured
slow 85°C slow 0°C fast 0°C actual
missing graph
Propagation delay for simplified-Samovi square-root
The next chapter draws some conclusions and refers to materials for further study.
Student at MVHS
I see education as the foundation upon which entrepreneurs are able to build innovative organizations and execute their vision for the future.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.