Square root circuit

Implements a math square root using a circuit of logic gates. Written in parameterized Verilog HDL for Altera and Xilinx FPGA’s.

Square root using logic gates

(c) Copyright 2016 Coert Vonk The square root method implemented here is a simplification of Samavi and Sutikno improvements of the non-restoring digit recurrence square root algorithm. For details about this method, refer to Chapter 7 of the inquiry “How do Computers do Math?“. \(\)

Simplified Samovi Square Root

The square root of an integer can be calculated using a circuit of Controlled Subtract-Multiplex (csm) blocks. The blocks were introduced as part of the divider implementation. The square root circuit for an 8-bits value is given in shown below.

Each row performs one “attempt subtraction” cycle. For a start, the top row attempts to subtracts binary 01. If the answer would be negative, the most significant bit of the output will be ‘1’. This msb drives the drives the Output Select (\(os\)) inputs that effectively cancels the subtraction if the result would be negative.

8-bit simplified-Samovi square-root

Similar to the divider, using Verilog HDL we can generate instances of csm blocks based on the word length of the radicand (xWIDTH) . Once more, to describe the circuit in Verilog HDL, we need to derive the rules that govern the connections between the blocks.

Start by numbering the output ports based on their location in the matrix. For this circuit, we have the output signals difference \(d\) and borrow-out \(b\). E.g. \(d_{13}\) identifies the difference signal for the block in row 1 and column 3. Next, we express the input signals as a function of the output signal names \(d\) and \(b\) and do the same for the quotient itself as shown in the table below.


Based on this table, we can now express the interconnects using Verilog HDL using ?: expressions.

generate genvar ii, jj;
: 2BD

The complete Verilog HDL source code along with the test bench and constraints is available at:


As usual, the propagation delay \(t_{pd}\) depends size \(N\) and the value of operands. For a given size \(N\), the maximum propagation delay occurs when each subtraction needs to be cancelled.

Post-map Timing Analysis reveals the worst-case propagation delays. The values in the table below, assume that the size of both operands is the same. The exact value depends on the model and speed grade of the FPGA, the silicon itself, voltage and the die temperature.

\(N\) Timing Analysis Measured
slow 85°C slow 0°C fast 0°C actual
Propagation delay for simplified-Samovi square-root


Judging from the number of research papers popping up each year, we can deduce that this is a still active field.

Add elementary functions such as sin, cos, tan, exponential, and logarithm.

Suggestion for further reading