.

How do microprocessors work?

The inquiry “How do computers do math introduced the components needed to build a microprocessor.  This article continues by introducing the microprocessor.  It uses a top-down approach.  Using a top down style helps break up the complex microprocessor into simple, more manageable parts.  Starting from the computer architecture, it dives down to an instruction set.  In the second half we build a microprocessor using a field programmable gate array.

Credits

This text leans on chapter 4 from the excellent book “Introduction to Computer Systems” by Patt and Partel.  The implementation borrows from the Project#1 description at NC State University by Davis.

Architecture

World War II bought destruction but also spurred a flurry of computer innovations.  The first general purpose computer was build using electro-mechanical relays in Nazi Germany (1941).  Two years later the US army build a vacuum tube based computer for calculating artillery firing tables and simulating the H bomb.  The computer could perform complex sequences of operations, would could include loops, branches and subroutines.  The program in this computer was hardwired using switches and dials.  It was a thousand times faster as the electro-mechanical machine, but it took great effort to change the program.

  1. von Neuman
    • In 1945, the mathematician John von Neumann formalized processor methods developed at the University of Pennsylvania.  These methods became known as the von Neumann architecture and still forms the foundation for modern day computers.
      own work (well, ehh von Neumann really)
    • Parts:
      • Processing Unit, performs arithmetic and logical operations.  Consists of:
        • Arithmetic and Logic Unit (ALU), contains functional units such as adder, multiplier and bit-wise operations.
        • Processor registers, provides temporary storage for operands and results.
      • Control Unit, interprets instructions and orchestrates the execution of these instructions.  Consists of:
        • Instruction register (IR), reads the instruction from memory
        • Program counter (PC), contains the address of the next instruction to be executed
      • Memory, containing both instructions and data.  Supports:

        • load, reads a value from the memory location MAR and stores it in MDR.
        • store, writes the value from MDR to memory location MAR.
      • Input and Output mechanisms
        • Memory mapped devices for moving data in and out of the computer memory.
          :
  2. Instruction set architecture

    • The instruction set architecture (ISA) specifies all the information for somebody who wants to write a program in machine language.  It contains:
      • Memory organization, specifies the address maps; how many bits per location;
      • Register set, specifies the size; how many registers; and how they can be used;
      • Instruction set, specifies the opcodes; operands; data types; addressing modes
    • Instructions
      • Are a fundamental unit of work, that includes:
        • opcode, the operation to be performed;
        • operands, the data (locations) to be used in the operation.
      • Are executed completely, or not at all (autonomous).
      • Look just like data — a collection of bits.  They are just interpreted differently.
    • Instruction types:
      • arithmetic and logical instructions,
      • memory access instructions,
      • control instructions.  Control instructions may change the address in the program counter, permitting repetitive operations.  The change may depend on some arithmetic condition, giving the effect of a decision.
        :
  3. Instruction cycle phases
    • Notes
      • Not all phases are needed by every instruction.
      • Instruction cycle phases may take a variable number of clock cycles.
    • Fetch instruction. 
      • Writes the value of the program counter (PC) to MAR; sends a read signal to the memory unit; and reads the memory value to the instruction register (IR); then increment PC.
    • Decode instruction.
      • Identifies the opcode and operands from the IR.
    • Address
      • Calculates the memory address for instructions that require memory access.
    • Fetch operands
      • Obtains the source operands needed to perform operation.  E.g. read register value or memory location
    • Operation
      • Performs the arithmetic or logical operation using the source operands, for arithmetic or logical instructions.
    • Store result
      • Writes the results to a register or memory location.
    • Repeat, starting at Fetch instruction.
      :

Instruction Set Architecture (LC-3)

The book “Introduction to Computer Systems” by Patt and Partel introduces an hypothetical microprocessor.  Its simple-as-possible instruction set helps us understand the inner workings of a microprocessor.  For this text we simplify the little computer (LC-3) even further by:

  • not supporting subroutine calls (JSR, JSRR, RET),
  • not supporting interrupt handling (RTI, TRAP),
  • not supporting overflow detection in arithmetic operations,
  • not validating the instruction format, and
  • replacing the TRAP 0, with a simple HALT instruction.

With the exception of these simplifications, the Instruction Set Architecture (ISA) is specified in the book “Introduction to Computer Systems“.  This section summarizes this ISA.  For more details, refer to Appendix A.3 of the book.

  1. Overview
    • Memory organization:
      • 16-bit addresses; word addressable only,
      • 16-bit memory words.
    • Memory map
      • User programs start at hex 3000, and may extend to hex FDFF.
    • Bit numbering
      • Bits are numbered from right (least significant bit) to left (most significant bit), starting with bit 0.
    • Registers
      • 16-bit program counter (pc), contains the address of the next instruction.
      • 16-bit general purpose registers, 8 registers identified with binary 000 to 111 (R0 .. R7).
      • 3-bit processor status register (psr), updated as a result of instructions that write to a register.
        • psr[2]==1, when 2’s complement value is negative (n).
        • psr[1]==1, when 2’s complement value is zero (z).
        • psr[0]==1, when 2’s complement value is positive (p).
    • Instructions
      • 16-bit instructions, RISC (all instructions the same size).
        • the opcode, is encoded in the the 4 most significant bits of the instruction (bit 15..12).
        • the operands, are encoded in the remainder of the instruction.
      • ALU performs ADD, AND, NOT on 16-bit words.
        :
  2. Instructions
    • Overview
      • Only 12 bits are available for the operands.  This implies that 16-bit data values or memory addresses have to be specified indirectly.
      • Conventions:
        • sr1ID, sr2ID, source register identifier (000..111 for R0..R7)
        • drID, destination register identifier (000..111 for R0..R7)
        • baseRID, base register identifier (000..111 for R0..R7)
        • sr1, sr2, 16-bit source register value
        • dr, 16-bit destination register value
        • baseR, base register value, used together with 2’s complement offset to calculate memory address
        • imm5, 5-bit immediate value as 2’s complement integer
        • mem[address], contents of memory at the given address
        • offset6, 6-bit value as 2’s complement integer
        • offset9, 9-bit value as 2’s complement integer
        • SEXT, sign-extend, by replicating the most significant bit as many times as necessary to extend to the word size of 16 bits.
    • ALU instructions
      • There are two variations of the ADD and AND instructions.  One takes the second argument from sr2.  The other takes it from the immediate value imm5.
      • Format:
        own work
    • Memory instructions
      • Types:
        • load, reads a value from memory to a register (LD, LDI, LDR)
        • store, writes a value from a register to memory (ST, STI, STR)
      • Addressing modes:
        • PC relative, the address is calculated by adding an offset to the incremented PC;
        • Register relative, address is read from a register;
        • Indirect, address is read from a memory location who”s address is calculated by adding an offset to the incremented program counter;
        • Load effective address, address is calculated by adding an offset to the incremented program counter.  The address itself (not its value) is stored in a register.
      • Format:
        own work
    • Control instructions
      • Types
        • Unconditional (JMP), jumps to the address specified by baseR;
        • Conditional (BR), the branch is only performed, if one or more of the condition flags correspond to the negative (n), zero (z) and positive (p) flags in the processor status register (psr);
        • Stop program execution (HALT).  A simplified version of TRAP 0.
      • Format:
        own work

LC-3 Architecture

The microprocessor consists of a data path and a control unit.  Together they implement the instruction phases.  This section describes an architecture for the LC-3.  It aims at staying true to the von Neumann architecture and instruction cycle names.  Some notable differences are that we assume the program counter and instruction register are in the data path.

  1. Data path
    • Overview
      • The data path for this implementation is shown in the schematic below.  The shaded blocks are modules that implement various functionality.  The module names have been chosen to reflect the instruction phases.
      • Signals connect the blocks.  A signal can be a single wire, or a collection of wires such as the 16 bits that represent the value of the program counter.
      • The microprocessor connects to an external memory through an external interface.
    • Schematic for the data path
      own work
    • Signals for the data path

      • program counters
        • pc, program counter
        • nPC, next program counter (always has the value pc+1)
        • tPC, target program counter (JMP/BR)
      • operands
        • sr1ID, source register 1 identifier (baseRID for JMP/LDR/STR)
        • sr2ID, source register 2 identifier (srID for ST/STI/STR)
        • drID, destination register identifier
        • imm, immediate value
        • offset, memory offset
      • register values
        • sr1, the value of the register identified by sr1ID
        • sr2, the value of the register identified by sr2ID
        • dr, the value written to the register identified by drID
        • psr, the value of the processor status register
      • intermediate values
        • uOut, the result of the ALU operation
        • aOut/addr/tPC, the result of the Address calculation
      • external bus
        • eADDR, memory address
        • eDIN, instruction or data being read from memory
        • eDOUT, data being written to memory
        • eWEA, write enable signal going to memory (0 read; 1 for write)
      • internal bus
        • iBR0, iBR1, internal bus request signals
        • iADDR0, iADDR1, internal memory addresses
        • iWEA0, iWEA1, internal write enable signals
    • Modules (short description)

      • External interface, connects the microprocessor to the external memory bus.
      • Update PC, maintains the program counter (pc).
      • Fetch, initiates the bus cycle to read the instruction pointed to by the pc.
      • Decode, reads the instruction from the memory bus and extracts its operands.
      • Registers, maintains the register values and processor status register.
      • ALU, performs arithmetic and logical operations.
      • Address, calculates memory address for memory or control instructions.
      • Memory IF, initiates the external memory bus cycle to read or write data.
      • DrMUX, destination register multiplexor, selects the value that will be written to the destination register.
      • Bus driver, simple arbiter for memory read requests from Fetch and Memory IF.
    • Examples
      • Read memory
        • assume: a reset has been issued, the instruction at address 3000 is 201F (LD R0, LDv).  The label LDv refers to the data address 3020.
        • UpdatePC resets the pc to its initial value (pc=3000, nPC=3001).
        • Fetch starts a read cycle for the instruction (br0=1, iADDR0=3000, iWEA0=0).
        • Bus driver, forwards the read cycle to the external memory bus (eADDR=3000, eWEA=0).
        • external memory responds with the instruction (eDIN=201f).
        • Decode extracts the operands (offset=1f and drID=0).
        • Address adds the offset to nPC (addr=3020).
        • MemoryIF starts a read cycle for the data (iBR1=1, iADDR1=3020, iWEA1=0).
        • Bus driver, forwards the read cycle to the external memory bus (eADDR=3020, eWEA=0).
        • external memory responds with the data (eDIN=1234).
        • DrMux selects the eDIN input (dr=1234).
        • Registers writes the value dr to the register identified by drID.
      • ALU operation
        • assume: pc=3003, the instruction at the next address 3004 is 1801 (ADD R4, R0, R1).  The register values are R0=1234 and R1=4321.
        • UpdatePC increments the program counter (pc=3004).
        • Fetch starts a read cycle for the instruction (iBR0=1, iADDR0=3004, iWEA0=0).
        • Bus driver, forwards the read cycle to the external memory bus (eADDR=3004, eWEA=0).
        • external memory responds with the instruction (eDIN=1801).
        • Decode extracts the operands (sr1ID=0, sr2ID=1, drID=4).
        • Registers supplies the values for the registers identified by sr1ID and sr2ID (sr1=1234, sr2=4321).
        • ALU calculates the sum of srr1 and srr2 (uOut=5555).
        • DrMux selects the uOut input (dr=5555).
        • Registers writes the value dr to the register identified by drID.
      • Write memory
        • assume: pc=3008, the instruction at the next address 3009 is B81D (STI R4, STIa).  Register R4=AAAA and the label STIa refers to data address 3027 containing the value 3028.
        • UpdatePC increments the program counter (pc=3009, nPC=300a).
        • Fetch starts a read cycle for the instruction (iBR0=1, iADDR0=3009, iWEA0=0).
        • Bus driver, forwards the read cycle to the external memory bus (eADDR=3009, eWEA=0).
        • external memory responds with the instruction (eDIN=b81f).
        • Decode extracts the operands (sr2ID=4, offset=1d).  Note that sr2ID represents the SR operand.
        • Registers supplies the value for the register identified by sr2ID (sr2=aaaa).
        • Address adds the offset to nPC (addr=3028).
        • MemoryIF starts a read cycle to retrieve the address where to store the value (iBR1=1, iADDR1=3027, iWEA1=0).
        • Bus driver, forwards the read cycle to the external memory bus (eADDR=3027, eWEA=0).
        • external memory responds with the value (eDIN=3028).
        • MemoryIF starts a write cycle to write the value of R4 to address 3028 (iBR1=1, iADDR1=3028, iWEA1=1).
        • Bus driver, forwards the write cycle to the external memory bus (eADDR=3028, eWEA=1).
          :
  2. Control unit
    • Overview
      • Instructions can be broken up into microinstructions.  These microinstructions can be implemented using a finite state machine (FSM), where each state corresponds to one microinstruction.
      • The finite state machine can be visualized as shown in the figure below.
        • circles, represent the states identified by a unique number and name.
        • double circle, represents the initial state.
        • arrows, represent state transitions.  Labels represent the condition that must be met for the transition to occur.
        • shading, is used to identify the implementation modules.
        • eREADY, indicates that the external memory finished a read or write operation.
        • iType, maType and indType refer to the generalized instruction types generated by Decoder.
    • State diagram
      own work
    • Details
      • Policies:
        • State transitions, are only possible during the falling edge of the clock signal (from 1 to 0);
        • Outputs, to the external memory interface, are driven in response to state transitions;
        • Inputs, from the external memory interface, are sampled on the rising edge of the clock signal (from 0 to 1);
        • Control signals, change only during the falling edge of the clock signal to minimize glitches.
      • Each state:
        • depends on both input signals and the previous state’
        • generates control signals control signals for the data path (with the help of the Decode module).
      • The control unit consists of two modules:
        • State, implements the state machine, and generates state specific control signals.
        • Decode, generalizes the instruction for the state machine, and generates state independent control signals.
    • Schematic for the control unit
      own work
    • Signals for the control unit
      • External interface
        • eREADY==1, indicates that the external memory finished a read or write operation.
        • clock, external supplied clock
      • Internal to the State module
        • state, current state
        • nState, next state as determined by the combinational logic
      • Generalized instruction types (bundled into cCtrl)
        • iType, instruction type
        • maType, memory access type
        • indType, indirect memory access type
      • Data path control
        • pNext, signals UpdatePC to change the program counter to tPC
        • pEn, enables UpdatePC to change the program counter
        • fEn, enable Fetch to start external memory bus cycle to read the instruction
        • dEn, enables Decode to read the instruction from the external memory bus
        • rWe, enables Registers to store the value of dr in the register identified by drID
        • uOp, chooses the operation and inputs of the ALU
        • aOp, chooses the operation and inputs of the Address calculation
        • mOp, chooses the memory operation to be performed by MemoryIF
        • drSrc, selects the destination register source input on DrMux
    • Modules (detailed description)
      • State
        • Generates the state specific control signals for each microinstruction being executed.  Refer to the signals described above for details.
      • UpdatePC
        • Updates the program counter (pc) is at the end of each instruction cycle.  The new value is:
          • 3000 if reset is asserted, or
          • the value of the tPC input, if a JMP or BR instruction was executed (and the condition was met), or
          • otherwise, the previous value of pc+1.
      • Fetch
        • Initiates the external bus cycle (iBR0, iADDR0, iWEA0) to read the instruction from the memory location pointed to by pc.
        • The control unit will maintain this state until the external memory reports that the data is available (eREADY).
      • Decode
        • Finishes the external bus cycle by reading the instruction from the external memory bus (eDIN).
        • Decodes the instruction:
          • Based on the opcode (ir[15:11]),  it generalizes the instruction type for the State module (cCtrl).
          • Based on the operands (ir[10:0]), it configures the data path using state independent control signals:
            • For ALU instructions, uOp, sr1ID, sr2ID, drSrc, drID,
            • For memory instructions, aOp, sr1ID (BaseR for LDR/STR), sr2ID (sr for ST/STI/STR), offset, drID
            • For control instructions, pNext, sr1 (BaseR for JMP).
      • Registers
        • Maintains the general purpose register (R0..R7).
        • Supplies the values for the registers identified by sr1ID and sr2ID.
        • Updates the register specified by drID to the value dr when rWe is asserted.
      • ALU
        • The input uOp selects both the operation type and inputs.
          • For ADD, if ir[5]==1 then uOut = sr1+imm5, else uOut=sr1+sr2.
          • For AND, if ir[5]==1 then uOut = sr1&imm5, else uOut=sr1&sr2.
          • For NOT, uOut=~sr1.
      • Address
        • Input aOp selects both the calculation type and inputs.
          • For BR, aOut=nPC+offset9.
          • For LD/LDI/LEA/ST/STI, the aOut=nPC+offset9.
          • For JMP/LDR/STR, aOut=sr1+offset6.
        • Note that aOut is connected to the addr input on MemoryIF, and the tPC input on FetchPC.
      • MemoryIF (memory interface)

        • Input mOp selects the memory access mode and inputs.
          • For LDI/STI, under the direction of the Control Unit it first initiates a memory read cycle for addr (aOut).  The Control Unit will maintain this state until the external memory reports that the data is available (eREADY).  It then takes the value read from memory (eDIN), and
            • for LDI, initiate a read cycle for address eDIN;
            • for STI, initiate a write cycle to write the value sr2 to address eDIN.
          • For the other instructions, it takes the address, and
            • for LD/LDR, read the value from addr (aOut);
            • for LEA, do nothing;
            • for ST/STR, write the value sr2 to addr (aOut).
          • The control unit will maintain this state until the external memory reports that the data is available (eREADY).
      • DrMux (destination register multiplexor)

        • Input drSrc selects the value that will be written to the destination register.
          • For ADD/AND/NOT, forwards uOut to dr
          • For LD/LDR/LDI, forwards eDIN to dr
          • For LEA, forwards aOut to dr.

Hardware description

One dark winter afternoon, my then almost 8 year old, asked “How do microprocessors work?”.   I answered “Let’s build one and find out”.  What started as a noble thought became a rather intense but fun and exciting project.  This section describes the implementation of the LC-3 using a Field Programmable Logic Array.  An FPGA is an array blocks with basic functionality such as Lookup table, a full adder and a flip-flop.  For more information on FPGAs refer to the section Programmable Logic in the inquiry “How do computers do math?“.

The FPGA used to implement the LC-3 microprocessor is a Xilinx Spartan6, but others will fit equally well.  My choice was inspired by the pricing of the development board and the fairly good free development tools.  Other choices would be Altera for the FPGA, their IDE or Icarus Verilog for the synthesizer and simulator and GTKWave for the waveform viewer.  Refer to the end of this article for links and references to introductory Verilog books.

The top level schematic is shown below.  The modules are defined using Verilog, an hardware description language (HDL) used to model digital logic.

This is my first Verilog implementation, so please bear with me ..

  1. State
    • State.v
      module State( input clock,
                    input reset,
                    input [4:0] cCtrl,      // controller control signal
                    input eREADY,           // external memory ready signal
                    output wire pEn,        // update PC enable
                    output wire fEn,        // fetch output enable
                    output wire dEn,        // decode enable
                    output wire [2:0] mOp,  // memory operation selector
                    output wire rWe );      // register write enable
      
        `include "UpdatePC.vh"
        `include "Fetch.vh"
        `include "Decode.vh"
        `include "Registers.vh"
        `include "MemoryIF.vh"
      
        parameter [3:0] STATE_UPDATEPC = 4'd0,   // update program counter
                        STATE_FETCH    = 4'd1,   // fetch instruction
                        STATE_DECODE   = 4'd2,   // decode
                        STATE_ALU      = 4'd3,   // ALU
                        STATE_ADDRNPC  = 4'd4,   // calc tPC address
                        STATE_ADDRMEM  = 4'd5,   // calc memory address
                        STATE_INDMEM   = 4'd6,   // indirect memory address
                        STATE_RDMEM    = 4'd7,   // read memory
                        STATE_WRMEM    = 4'd8,   // write memory
                        STATE_WRREG    = 4'd9,   // write register
                        STATE_ILLEGAL  = 4'd15;  // illegal state
      
        parameter       EREADY_INA     = 1'b0,   // external memory not ready
                        EREADY_ACT     = 1'b1,   // external memory ready
                        EREADY_X       = 1'bx;
      
        wire [1:0] iType   = cCtrl[4:3];  // instruction type (00=alu, 01=ctrl, 10=mem)
        wire [1:0] maType  = cCtrl[2:1];  // memory access type (00=indaddr, 01=read, 02=write, 03=updreg)
        wire       indType = cCtrl[0];    // indirect memory access type
      
        reg [3:0] state;   // current state
        reg [3:0] nState;  // next state
        reg [6:0] out;     // current output signals
        reg [6:0] nOut;    // next output signals
      
        assign pEn = out[6];
        assign fEn = out[5];
        assign dEn = out[4];
        assign mOp = out[3:1];
        assign rWe = out[0];
      
            // the combinational logic
      
        always @(state, eREADY, iType, maType, indType, state, out)
           casex ({state, eREADY, iType, maType, indType})
             {STATE_UPDATEPC, EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_FETCH;    nOut = {PEN_0, FEN_1, DEN_0, MOP_NONE, RWE_0}; end
      	    {STATE_FETCH,    EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_DECODE;   nOut = {PEN_0, FEN_0, DEN_1, MOP_NONE, RWE_0}; end
      		 {STATE_DECODE,   EREADY_X,   ITYPE_ALU, MATYPE_X,   INDTYPE_X}  : begin nState = STATE_ALU;      nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
             {STATE_DECODE,   EREADY_X,   ITYPE_CTL, MATYPE_X,   INDTYPE_X}  : begin nState = STATE_ADDRNPC;  nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
             {STATE_DECODE,   EREADY_X,   ITYPE_MEM, MATYPE_X,   INDTYPE_X}  : begin nState = STATE_ADDRMEM;  nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
             {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_IND, INDTYPE_X}  : begin nState = STATE_INDMEM;   nOut = {PEN_0, FEN_0, DEN_0, MOP_RD,   RWE_0}; end
             {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_RD,  INDTYPE_X}  : begin nState = STATE_RDMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_RD,   RWE_0}; end
             {STATE_INDMEM,   EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_RD} : begin nState = STATE_RDMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_RDI,  RWE_0}; end
             {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_WR,  INDTYPE_X}  : begin nState = STATE_WRMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_WR,   RWE_0}; end
             {STATE_INDMEM,   EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_WR} : begin nState = STATE_WRMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_WR,   RWE_0}; end
             {STATE_ALU,      EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_WRREG;    nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_1}; end
             {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_REG, INDTYPE_X}  : begin nState = STATE_WRREG;    nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_1}; end
             {STATE_RDMEM,    EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_WRREG;    nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_1}; end
             {STATE_WRMEM,    EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_UPDATEPC; nOut = {PEN_1, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
             {STATE_WRREG,    EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_UPDATEPC; nOut = {PEN_1, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
             {STATE_ADDRNPC,  EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_UPDATEPC; nOut = {PEN_1, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
      	    {STATE_FETCH,    EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
             {STATE_INDMEM,   EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
             {STATE_RDMEM,    EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
             {STATE_WRMEM,    EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
             default                                                         : begin nState = STATE_ILLEGAL;  nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
           endcase
      
            // the sequential logic
      
        always @(negedge clock, posedge reset)
          if (reset)
      	   begin
      	     state <= STATE_UPDATEPC;
      		  out <= {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0};
      		end
        else
          begin
            state <= nState;
      		out <= nOut;
           end;
      endmodule

      :

  2. Decode
    • Decode.vh
      parameter       DEN_0          = 1'b0,   // Decode enable
                      DEN_1          = 1'b1;
      
      parameter [1:0] ITYPE_ALU      = 2'b00,  // generalized instruction type
                      ITYPE_CTL      = 2'b01,
                      ITYPE_MEM      = 2'b10,
                      ITYPE_HLT      = 2'b11,
                      ITYPE_X        = 2'bxx;
      
      parameter [1:0] MATYPE_IND     = 2'b00,  // generalized memory access type
                      MATYPE_RD      = 2'b01,
                      MATYPE_WR      = 2'b10,
                      MATYPE_REG     = 2'b11,
                      MATYPE_X       = 2'bxx;
      
      parameter       INDTYPE_WR     = 1'b0,   // generalized memory indirection type
                      INDTYPE_RD     = 1'b1,
                      INDTYPE_X      = 1'bx;
    • Decode.v
      module Decode( input clock,
                     input reset,
                     input en,                   // input enable
                     input [15:0] eDIN,          // external memory data input
                     input [2:0] psr,            // processor status register
                     output [4:0] cCtrl,         // various control signals
                     output [1:0] drSrc,         // selects what to write to DR
                     output [2:0] uOp,           // selecta ALU operation
                     output aOp,                 // selects Address operation
                     output pNext,               // selects if PC should branch
                     output [2:0] sr1ID,         // source register 1 ID
                     output [2:0] sr2ID,         // source register 2 ID
                     output [2:0] drID,          // destination register ID
                     output wire [4:0] imm,      // lower 5 bits from IR value
                     output wire [8:0] offset ); // lower 9 bits from IR value
      
        `include "ALU.vh"
        `include "Address.vh"
        `include "MemoryIF.vh"
        `include "DrMux.vh"
        `include "UpdatePC.vh"
        `include "Decode.vh"
      
        parameter [2:0] ID_X = 3'bxxx;
      
            // Instruction Register (ir)
      		// read instruction from external memory bus (after Fetch initiated the bus cycle)
      
        reg [15:0] ir;
        assign imm    = ir[4:0];  // output the lower 5 bits
        assign offset = ir[8:0];  // output the lower 9 bits
      
        always @(posedge clock, posedge reset)
          if (reset)
      	   ir = 16'hffff;
      	 else
            if (en == DEN_1)
      	     ir = eDIN;
      
        parameter [3:0]           // opcodes for the instructions
      	 I_BR  = 4'b0000,
          I_ADD = 4'b0001,
      	 I_LD  = 4'b0010,
      	 I_ST  = 4'b0011,
      	 I_AND = 4'b0101,
      	 I_LDR = 4'b0110,
      	 I_STR = 4'b0111,
      	 I_NOT = 4'b1001,
      	 I_LDI = 4'b1010,
      	 I_STI = 4'b1011,
      	 I_JMP = 4'b1100,
      	 I_LEA = 4'b1110,
      	 I_HLT = 4'b1111;
      
        reg [20:0] ctl;           // current control signal bundle
      
            // untangle control signal bundle
      
        assign cCtrl = ctl[ 20:16 ];  // { iType, maType, indRd }
        assign uOp   = ctl[ 15:13 ];
        assign aOp   = ctl[    12 ];
        assign drSrc = ctl[ 11:10 ];
        assign pNext = ctl[     9 ];
        assign drID  = ctl[  8: 6 ];
        assign sr1ID = ctl[  5: 3 ];
        assign sr2ID = ctl[  2: 0 ];
      
            // combinational logic to determine control signals
      
        wire [2:0] uOpAddC = (ir[5]) ? UOP_ADDIMM : UOP_ADDREG;         // candidate for uOp in case of ADD instruction
        wire [2:0] uOpAndC = (ir[5]) ? UOP_ANDIMM : UOP_ANDREG;         // candidate for uOp in case of AND instruction
        wire pNextC        = |(ir[11:9] & psr) ? PNEXT_TPC : PNEXT_NPC; // candidate for pNext in case of BR instruction
      
        always @(ir[15:12], uOpAddC, uOpAndC, pNextC)
                          // State      State       State       ALU      Address  DrSource    UpdatePC   Registers RegistersRegisters
          case (ir[15:12])// iType      maType      indType     uOp      aOp      drSrc       pNext      drID      sr1ID    sr2ID
            I_ADD   : ctl = {ITYPE_ALU, MATYPE_X,   INDTYPE_X,  uOpAddC, AOP_X,   DRSRC_ALU,  PNEXT_NPC, ir[11:9], ir[8:6], ir[2:0] };
            I_AND   : ctl = {ITYPE_ALU, MATYPE_X,   INDTYPE_X,  uOpAndC, AOP_X,   DRSRC_ALU,  PNEXT_NPC, ir[11:9], ir[8:6], ir[2:0] };
            I_NOT   : ctl = {ITYPE_ALU, MATYPE_X,   INDTYPE_X,  UOP_NOT, AOP_X,   DRSRC_ALU,  PNEXT_NPC, ir[11:9], ir[8:6], ID_X    };
            I_BR    : ctl = {ITYPE_CTL, MATYPE_X,   INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_X,    pNextC,    ID_X,     ID_X,    ID_X    };
            I_JMP   : ctl = {ITYPE_CTL, MATYPE_X,   INDTYPE_X,  UOP_X,   AOP_SR1, DRSRC_X,    PNEXT_TPC, ID_X,     ir[8:6], ID_X    };
            I_LD    : ctl = {ITYPE_MEM, MATYPE_RD,  INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_MEM,  PNEXT_NPC, ir[11:9], ID_X,    ID_X    };
            I_LDR   : ctl = {ITYPE_MEM, MATYPE_RD,  INDTYPE_X,  UOP_X,   AOP_SR1, DRSRC_MEM,  PNEXT_NPC, ir[11:9], ir[8:6], ID_X    };
            I_LDI   : ctl = {ITYPE_MEM, MATYPE_IND, INDTYPE_RD, UOP_X,   AOP_NPC, DRSRC_MEM,  PNEXT_NPC, ir[11:9], ID_X,    ID_X    };
            I_LEA   : ctl = {ITYPE_MEM, MATYPE_REG, INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_ADDR, PNEXT_NPC, ir[11:9], ID_X,    ID_X    };
            I_ST    : ctl = {ITYPE_MEM, MATYPE_WR,  INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_X,    PNEXT_NPC, ID_X,     ID_X,    ir[11:9]};
            I_STR   : ctl = {ITYPE_MEM, MATYPE_WR,  INDTYPE_X,  UOP_X,   AOP_SR1, DRSRC_X,    PNEXT_NPC, ID_X,     ir[8:6], ir[11:9]};
            I_STI   : ctl = {ITYPE_MEM, MATYPE_IND, INDTYPE_WR, UOP_X,   AOP_NPC, DRSRC_X,    PNEXT_NPC, ID_X,     ID_X,    ir[11:9]};
            default : ctl = {ITYPE_HLT, MATYPE_X,   INDTYPE_X,  UOP_X,   AOP_X,   DRSRC_X,    PNEXT_X,   ID_X,     ID_X,    ID_X    };
          endcase 
      
      endmodule

      :

  3. UpdatePC
    • UpdatePC.vh
      parameter PNEXT_NPC  = 1'b0,  // UpdatePC branch signal
                PNEXT_TPC  = 1'b1,
                PNEXT_X    = 1'bx;
      
      parameter PEN_0      = 1'b0,  // UpdatePC enable
                PEN_1      = 1'b1;
    • UpdatePC.v
      module UpdatePC( input clock,
                       input reset,
                       input en,                 // enable signal
                       input [15:0] tPC,         // target program counter
                       input pNext,              // if 1 then branch to tPC
                       output reg [15:0] pc,     // program counter
                       output reg [15:0] nPC );  // next program counter (pc+1)
      
        `include "UpdatePC.vh"
      
        wire [15:0] a = (pNext) ? tPC : nPC;     // if pNext==1, then jump to tPC
        wire [15:0] b = (en == PEN_1) ? a : pc;  // change PC only in "Update PC" state
        wire [15:0] c = b + 1'b1;                // use carry input
      
        always @(posedge clock, posedge reset)
          if (reset)
      	   begin
      		  pc  <= 16'h3000;
      		  nPC <= 16'h3001;
      		end
          else
            begin
              pc  <= b;
              nPC <= c;
            end;
      
      endmodule

      :

  4. Fetch
    • Fetch.vh
      parameter FEN_0 = 1'b0,  // fetch enable
                FEN_1 = 1'b1;
    • Fetch.v
      module Fetch( input en,                 // output enable
                    input [15:0] pc,          // program counter
      				  output reg iBR,           // internal memory address lines
      				  output reg [15:0] iADDR,  // internal memory address lines
                    output reg iWEA );        // internal memory write enable
      
        `include "Fetch.vh"
      
        always @(en, pc)
          begin
            iBR   <= ( en == FEN_1 ) ? 1'b1 : 1'b0;
      	   iADDR <= ( en == FEN_1 ) ? pc : 16'hxxxx;
            iWEA  <= ( en == FEN_1 ) ? 1'b0 : 1'bx;
          end
      
      endmodule

      :

  5. Registers
    • Registers.vh
      parameter       RWE_0 = 1'b0,           // register write enable
                      RWE_1 = 1'b1;
      
      parameter [2:0] PSR_POSITIVE = 3'b001,  // processor status register bits
                      PSR_ZERO     = 3'b010,  //   should match BR instruction
      					 PSR_NEGATIVE = 3'b100;
    • Registers.v
      module Registers( input clock,
                        input reset,
      				 	   input we,                // write enable
      				 	   input [2:0] sr1ID,       // source register 1 ID
      					   input [2:0] sr2ID,       // source register 2 ID
      					   input [2:0] drID,        // destination register ID
      					   input [15:0] dr,         // destination register value
      					   output reg [15:0] sr1,   // source register 1 value
      					   output reg [15:0] sr2,   // source register 2 value
      						output reg [2:0] psr );  // processor status register
      
        `include "Registers.vh"
      
        reg [3:0] id;
        reg [15:0] gpr [0:7];     // general purpose registers
      
            // write the destination register value, and update Process Status Register (psr)
      
        always @(posedge clock, posedge reset)
          if (reset)
            for (id = 0; id < 7; id = id + 1)  // initial all registers to 0
              gpr[ id ] <= 16'h0000;
          else
            if (we == RWE_1)          // when enabled by the FSM
        	     begin
                if (dr[ 15 ])         // update processor status register (neg,zero,pos)
                  psr <= PSR_NEGATIVE;
                else if (|dr)
                  psr <= PSR_POSITIVE;
                else
                  psr <= PSR_ZERO;
      
      			 gpr[ drID ] <= dr;     // write the value dr to the register identified by drID
              end
      
            // output the value of the register identified by "sr1ID" on output "sr1"
            // output the value of the register identified by "sr2ID" on output "sr2"
      
        always @(sr1ID, sr2ID, gpr[ sr1ID ], gpr[ sr2ID ])
          begin
      	   sr1 = gpr[ sr1ID ];
      		sr2 = gpr[ sr2ID ];
          end
      
      endmodule

      :

  6. ALU
    • ALU.vh
      parameter [2:0] UOP_ADDREG = 3'b000,  // ALU operation
                      UOP_ADDIMM = 3'b001,
                      UOP_ANDREG = 3'b010,
                      UOP_ANDIMM = 3'b011,
                      UOP_NOT    = 3'b100,
                      UOP_X      = 3'bxxx;
    • ALU.v
      module ALU( input [2:0] uOp,          // operation selector
                  input [15:0] sr1,         // source register 1 value (SR1)
                  input [15:0] sr2,         // source register 2 value (SR2)
                  input [4:0] imm,          // lower 5 bits from instruction register
      	         output reg [15:0] uOut ); // result of ALU operation
      
        `include "ALU.vh"
      
        wire [15:0] imm5 = ({ {11{imm[4]}}, imm[4:0] });  // sign extend to 16 bits
      
        always @(uOp or sr1 or sr2 or imm5)
          casex (uOp)
            3'b000: uOut = sr1 + sr2;   // ADD Mode 0
            3'b001: uOut = sr1 + imm5;  // ADD Mode 1
            3'b010: uOut = sr1 & sr2;   // AND Mode 0
            3'b011: uOut = sr1 & imm5;  // AND Mode 1
            3'b1xx: uOut = ~(sr1);      // NOT
          endcase
      endmodule

      :

  7. Address
    • Address.vh
      parameter AOP_SR1 = 1'b0,  // address operation
                AOP_NPC = 1'b1,
                AOP_X   = 1'bx;
    • Address.v
      module Address( input aOp,                // operation selector
                      input [15:0] sr1,         // value source register 1
                      input [15:0] nPC,         // next program counter (PC), always PC+1
                      input [8:0] offset,       // lower 9 bits from instruction register
                      output reg [15:0] aOut ); // target program counter
      
        `include "Address.vh"
      
        wire [15:0] offset6 = ({{10{offset[5]}}, offset[5:0]});  // sign extended the 6-bit offset
        wire [15:0] offset9 = ({{ 7{offset[8]}}, offset[8:0]});  // sign extended the 9-bit offset
      
        always @(aOp or sr1 or nPC or offset6 or offset9)
          case (aOp)
            AOP_SR1 : aOut = sr1 + offset6;  // register + offset
            AOP_NPC : aOut = nPC + offset9;  // next PC  + offset
          endcase
      
      endmodule

      :

  8. MemoryIF
    • MemoryIF.vh
      parameter [2:0] MOP_NONE = 3'b000,  // MemoryIF operation
                      MOP_RD   = 3'b100,
                      MOP_RDI  = 3'b101,
                      MOP_WR   = 3'b110,
                      MOP_WRI  = 3'b111;
    • MemoryIF.v
      module MemoryIF( input [2:0] mOp,         // memory operation selector
                       input [15:0] sr2,        // source register 2 value
      					  input [15:0] addr,       // address for read or write
      					  input [15:0] eDIN,       // external memory data input
      					  output reg iBR,          // internal bus request
      					  output reg [15:0] iADDR, // internal memory address lines
      					  output tri [15:0] eDOUT, // internal memory data output
      					  output reg iWEA );       // internal memory write enable
      
        `include "MemoryIF.vh"
      
        reg [15:0] eDOUTr;
        assign eDOUT = eDOUTr;
      
        always @(mOp, sr2, addr, eDIN)
          case (mOp)
            MOP_RD  : begin iBR=1; iWEA = 1'b0; iADDR = addr;     eDOUTr = 16'hzzzz; end
      		MOP_RDI : begin iBR=1; iWEA = 1'b0; iADDR = eDIN;     eDOUTr = 16'hzzzz; end
            MOP_WR  : begin iBR=1; iWEA = 1'b1; iADDR = addr;     eDOUTr = sr2;      end
            MOP_WRI : begin iBR=1; iWEA = 1'b1; iADDR = eDIN;     eDOUTr = sr2;      end
            default : begin iBR=0; iWEA = 1'bx; iADDR = 16'hxxxx; eDOUTr = 16'hzzzz; end
          endcase
      
      endmodule

      :

  9. DrMux
    • DrMux.vh
      parameter [1:0] DRSRC_ALU  = 2'b00,  // destination register source selector
                      DRSRC_MEM  = 2'b01,
                      DRSRC_ADDR = 2'b10,
                      DRSRC_X    = 2'bxx;
    • DrMux.v
      module DrMux( input [1:0] drSrc,       // multiplexor selector
                    input [15:0] eDIN,       // external memory data input
      			     input [15:0] addr,       // effective memory address
      	           input [15:0] uOut,       // result from ALU
      		 	     output reg [15:0] dr );  // data that will be stored in DR
      
        `include "DrMux.vh"
      
        always @(drSrc or uOut or eDIN or addr)
          case (drSrc)
            DRSRC_ALU  : dr = uOut;
            DRSRC_MEM  : dr = eDIN;
            DRSRC_ADDR : dr = addr;
      		default    : dr = 16'hxxxx;
          endcase
      
      endmodule:

      :

  10. BusDriver
    • BusDriver.v
      module BusDriver( input br0,                // input 0, bus request
                        input [15:0] iADDR0,      // input 0, internal memory address lines
      						input iWEA0,              // input 0, internal memory write enable
                        input br1,                // input 1, bus request
                        input [15:0] iADDR1,      // input 1, internal memory address lines
      						input iWEA1,              // input 1, internal memory write enable
                        output tri [15:0] eADDR,  // external memory address lines
      					   output tri  eWEA );       // external memory write enable
      
        assign eWEA  = br1 ? iWEA1 :
                       br0 ? iWEA0 : 1'bz;
      
        assign eADDR = br1 ? iADDR1 :
                       br0 ? iADDR0 : 16'hzzzz;
      
      endmodule

      :

Functional simulation

The functionality of our microprocessor can be tested by building a test bench.  The bench will supply the clock signal and reset pulse and simulate a random access memory (RAM) containing the test program.  The program is written using in a assembly language and compiled using LC3Edit.

  1. Test program
    • Exercises a variety of instructions:
      • memory read
      • alu
      • memory write
      • control instructions
    • Written in assembly language
      own work
    • LC3Edit compiles this into the object file:
      own work
    • As part of the compilation, LC3Edit also creates a .hex file.  The contents of this file can be tweaked into a .coe file to be preloaded in the test bench memory.
      own work
      :
  2. Memory
    • The random access memory (RAM) is created using Xilinx IP’s Block Memory Generator 6.2.  The following parameters are used:
      • native i/f, single port ram, no byte write enable, minimum area algorithm, width 16, depth 4096, write first, use ENA pin, no output registers, no RSTA pin, enable all warnings.  Initialize memory from the .coe file.
        :
  3. Test bench and clock/reset signals
    • Generate a 50 MHz symmetric clock
    • Integrate the parts into a test bench using Verilog.
      `timescale 1ns / 1ps
      module SimpleLC3_SimpleLC3_sch_tb();
      
        reg clock;          // clock                        (generated by test fixture)
        reg reset;          // reset                        (generated by test fixture)
      
        wire [15:0] eADDR;  // external address             (from LC3 to memory)
        wire [15:0] eDIN;   // external data                (from memory to LC3)
        wire [15:0] eDOUT;  // external data                (from LC3 to memory)
        wire eWEA;          // external write(~read) enable (from LC3 to memory)
      
            // Instantiate the Unit Under Test
        SimpleLC3 UUT ( .eDOUT(eDOUT),
      	               .eWEA(eWEA),
      	               .clock(clock),
       	               .eREADY(1'b1),  // ready (always 1, for now)
      	               .eDIN(eDIN),
      	               .reset(reset),
      	               .eADDR(eADDR) );
      
            // Instantiate the Memory, created using Xilinx IP's Block Memory Generator 6.2:
      		// Initialize from memory.coe, created from compiling memory.asm using LC3Edit.
        memory RAM( .clka(clock),
                    .ena(eENA),
                    .wea(eWEA),
                    .addra(eADDR[11:0]),
                    .dina(eDOUT),
                    .douta(eDIN) );
      
        wire eENA = |(eADDR[15:12] == 4'h3);  // memory is at h3xxx
      
        initial begin
      	 clock = 0;
      	 reset = 0;
          #15 reset = 1;  // wait for global reset to finish
       	 #22 reset = 0;
        end
      
        always
          #10 clock <= ~clock;  // 20 ns clock period (50 MHz)
      
      endmodule;
  4. Simulation results
    • For the functional simulation we use ISim that comes bundled with the Xilinx IDE.
    • The simulation needs to be ran for 1600 ns.
    • Waveform diagrams are shown below (click to enlarge)
    • own work

      Memory load

      own work

      ALU operations

      own work

      Memory store

      own work

      Control instructions

Timing simulation

The free Xilinx IDE doesn’t support timing simulations.  Instead we will use Icarus Verilog for the synthesis and simulation, GTKWave for viewing the generated waveforms, and Emacs verilog-mode for editing.  We will run them natively under Linux.  For those interested, Windows binaries are available from bleyer.org.

xxx WORK IN PROGRESS xxxx

explore what ChipScope Pro can do for me xxx

simulating memory http://www.ece.ncsu.edu/muse/courses/ece406spr09/labs/proj1/proj1test.v

The real thing

xxXilinx provides their Memory Interface Generator (MIG) tool for creating DRAM interface logic. On their Spartan-6 FPGAs, this instantiates a hard-IP block within the FPGA. A small wrapper/gasket is needed to convert the native MIG interface to whatever bus fabric the rest of the design is using (if using an AXI bus and a Spartan-6 or Virtex-6 device, the MIG can create an AXI wrapper for you).
800 Mbps x16 DDR memory controllers (2-4 total controllers, depending on device and package):

http://www.xilinx.com/support/documentation/user_guides/ug388.pdf

FYI http://opencores.org/project,zpu has GNU toolchain upport.

http://dorkbotpdx.org

Resources

Interesting: Complex Digital Systems MIT lecture slides

Coert Vonk

Coert Vonk

Independent Firmware Engineer at Los Altos, CA
Welcome to the things that I couldn’t find.This blog shares some of the notes that I took while deep diving into various fields.Many such endeavors were triggered by curious inquiries from students. Even though the notes often cover a broader area, the key goal is to help the them adopt, flourish and inspire them to invent new technology.
Coert Vonk

Latest posts by Coert Vonk (see all)

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  

  

  

Protected with IP Blacklist CloudIP Blacklist Cloud