Byte protocol on FPGA

Implements the SPI byte protocol on a FPGA. Enables the exchange bytes with the Arduino.. Written in Verilog HDL. This page concludes the second part of Math Talk. It shows an implementation of the SPI protocol on a Field Programmable Gate Array (FPGA).

Byte Exchange with a FPGA as Slave

Implementing the SPI Slave on an FPGA is like old school digital electronics. My key takeaway is to think hardware, not programming. Implementing the SPI protocol on a FPGA is fairly straightforward for as long as we use a directly clocked sequential circuit while preventing clock domain crossings.

Sequential circuit

In real life, two signals going to a single gate will not arrive there at the same time due to wire delays. This causes the output to momentarily have an incorrect value. The problem compounds as the signal travels through more gates and wires.

In Building Math Circuits we created elementary math operations using combinatorial circuits. That was OK, because we didn’t care about such output glitches caused by the input signals propagating to the outputs. From a demonstrator’s point of view it even made it more interesting. Talking to a real device, such as a SPI master is different, because it requires the outputs to be stable at certain times.

D flip-flop

The solution is to introduce a clock signal, and store the signals in a flip-flop (registers) at the rising edge of that clock signal. We then only need to ensure that the longest delay from one flip-flop to the next is less that the clock period. This greatly simplifies the design process, at the cost of introducing some delay.

The Verilog description shown below is an example of a synchronous design. It clocks signal s at the rising edge of the clock signal sysClk into register r. wire in; reg out; always @(posedge sysClk) out <= in;[/code]

Clock domain

Field programmable gate arrays thrive on synchronous designs, but they don’t do well with clock signals that are asynchronous with its system clock. In particular, constructs such as @(posedge SCLK), will give the synthesizer the impression that SCLK is a clock signal and cause it to reserve special low-skew clock buffers, causing the fitter to run out of such buffers, resort to use general routing for the real system clock signal.

Two-stage shift register

We also need to avoid transferring data from a flip-flop driven by one clock to a flip-flop driven by another clock. This is called a clock domain crossing and might manifest itself in metastability, data loss or incoherence [EE Times]. We prevent clock domain crossings, by synchronizing the input signals to the FPGA clock using a traditional two-stage shift register as illustrated above.

  1. The first flip-flop, creates a synchronous version of the inputs by clocking it with the system clock. The input signal could change within the flip-flop’s setup and hold times and may take longer than a system clock cycle to settle to a stable value (metastability). That’s why it is ran through a second flip-flop.
  2. The second flip-flop, makes it is very unlikely that this metastability propagates to the output.

Adding a third flip-flop gives us access to the previous value. Using the current and previous values, we can generate rise and fall signals as sown below. reg [2:0] async_r; always @(posedge sysClk) async_r <= { async_r[1:0], async }; wire rising = ( async_r[2:1] == 2'b01 ); wire falling = ( async_r[2:1] == 2'b10 ); wire sync = async_r[1];[/code]

In the remainder of this article we’ll refer to the synchronized versions of these SPI signals.


The main data object is an 8-bit register called data, similar to the one shown on the protocol page,

  • On a falling SCLK edge, the most significant bit from data is clocked into a register from where it is transmitted over its MISO output.
  • On a rising SCLK edge, the MOSI input is shifted into the least significant bit of data.

Once all eight bits are received, the byte is available as rx. This received byte rx should be read when rxValid is active during a rising edge of the sysClk.

Finite State Machine

The Byte module implements the SPI Slave protocol and converts a bit stream into bytes and visa versa. It is implemented using a state machine with 8 states, corresponding to the 8 bits per byte. The illustration below shows the Finite State Machine (FSM) with corresponding data path.

In general, a FSM is in charge and contains the state register, next-state logic and output logic. In this particular case, we didn’t require output logic. The FSM passes a control signal (state) to the the data path. The data path combines the control signal with its input signals to generate the output signals rx, rxValid and the MISO bitstream.

SPI Byte Exchange FSM with Data path


The timing diagram below shows the relation between the different signals. You may note that the input signal synchronization comes at the cost of introducing a delay. Given that the system clock is significantly faster then the SPI clock this should not pose a problem. The initial implementation on Xilinx used a 66 MHz system clock and a 4 MHz SPI clock as shown below.

Signals for Xilinx implementation

In a later iteration on Altera, we used a PLL to create a 200 MHz system clock. The only reason for such a high clock was that we eventually plan to use it to measure coarse propagation delay in circuits. The gate level simulation result is shown below.

Signals for Altera implementation


The complete project including constraints and test bench is available through

Much of the credit for the byte level implementation goes to fpga4fun. My key Verilog HDL files are:

On the FPGA, LED[0] will be on when it receives 0xAA. Assuming the Arduino sends alternating 0xAA and 0x55, LED[0] will blink. It always returns 0x55 to the Arduino.


To verify the implementation, we ran the test bench (spi_byte_tb.v) using gate level simulation. This test bench will monitor the communication and report errors when found. In the real world, we connected the Arduino SPI Master. The program on the Arduino, alternates writing 0xAA and 0x55 with a 10/90 duty cycle. As a consequence the LED should blink shortly every cycle.

Following “SPI byte protocol on FPGA”, the next part of this article introduces a Message Exchange Protocol, layered on top of this byte interface. This allows us to pass 32-bit register values over the SPI byte interface.