Implementation

Shows an implementation of the LC-3 instruction set in Verilog HDL. Includes test benches and simulation results.

Implementation

One dark Oregon winter afternoon, I said “Let’s build a micro processor”. What started as a noble thought became a rather intense but fun project.

This section describes the implementation of the LC-3 using a Field Programmable Logic Array. An FPGA is an array blocks with basic functionality such as Lookup table, a full adder and a flip-flop. For more information on FPGAs refer to the section Programmable Logic in the inquiry “How do computers do math?“.

The FPGA used to implement the LC-3 microprocessor is a Xilinx Spartan6, but others will fit equally well. My choice was inspired by the pricing of the development board and the fairly good free development tools. Other choices would be Altera for the FPGA, their IDE or Icarus Verilog for the synthesizer and simulator and GTKWave for the waveform viewer. Refer to the end of this article for links and references to introductory Verilog books.

Schematic

The top level schematic is shown below. The modules are defined using Verilog, an hardware description language (HDL) used to model digital logic.

LC3 schematic

This is my first Verilog implementation, please bear with me ..

State

State.v

Implementation of the LC-3 instruction set in Verilog, source file State.v:

 cCtrl,      // controller control signal
                input eREADY,           // external memory ready signal
                output wire pEn,        // update PC enable
                output wire fEn,        // fetch output enable
                output wire dEn,        // decode enable
                output wire [2:0] mOp,  // memory operation selector
                output wire rWe );      // register write enable

    `include "UpdatePC.vh"
    `include "Fetch.vh"
    `include "Decode.vh"
    `include "Registers.vh"
    `include "MemoryIF.vh"

    parameter [3:0] STATE_UPDATEPC = 4'd0,   // update program counter
                    STATE_FETCH    = 4'd1,   // fetch instruction
                    STATE_DECODE   = 4'd2,   // decode
                    STATE_ALU      = 4'd3,   // ALU
                    STATE_ADDRNPC  = 4'd4,   // calc tPC address
                    STATE_ADDRMEM  = 4'd5,   // calc memory address
                    STATE_INDMEM   = 4'd6,   // indirect memory address
                    STATE_RDMEM    = 4'd7,   // read memory
                    STATE_WRMEM    = 4'd8,   // write memory
                    STATE_WRREG    = 4'd9,   // write register
                    STATE_ILLEGAL  = 4'd15;  // illegal state

    parameter       EREADY_INA     = 1'b0,   // external memory not ready
                    EREADY_ACT     = 1'b1,   // external memory ready
                    EREADY_X       = 1'bx;

    wire [1:0] iType   = cCtrl[4:3];  // instruction type (00=alu, 01=ctrl, 10=mem)
    wire [1:0] maType  = cCtrl[2:1];  // memory access type (00=indaddr, 01=read, 02=write, 03=updreg)
    wire       indType = cCtrl[0];    // indirect memory access type

    reg [3:0] state;   // current state
    reg [3:0] nState;  // next state
    reg [6:0] out;     // current output signals
    reg [6:0] nOut;    // next output signals

    assign pEn = out[6];
    assign fEn = out[5];
    assign dEn = out[4];
    assign mOp = out[3:1];
    assign rWe = out[0];

        // the combinational logic

    always @(state, eREADY, iType, maType, indType, state, out)
        casex ({state, eREADY, iType, maType, indType})
        {STATE_UPDATEPC, EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_FETCH;    nOut = {PEN_0, FEN_1, DEN_0, MOP_NONE, RWE_0}; end
        {STATE_FETCH,    EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_DECODE;   nOut = {PEN_0, FEN_0, DEN_1, MOP_NONE, RWE_0}; end
            {STATE_DECODE,   EREADY_X,   ITYPE_ALU, MATYPE_X,   INDTYPE_X}  : begin nState = STATE_ALU;      nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
        {STATE_DECODE,   EREADY_X,   ITYPE_CTL, MATYPE_X,   INDTYPE_X}  : begin nState = STATE_ADDRNPC;  nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
        {STATE_DECODE,   EREADY_X,   ITYPE_MEM, MATYPE_X,   INDTYPE_X}  : begin nState = STATE_ADDRMEM;  nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
        {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_IND, INDTYPE_X}  : begin nState = STATE_INDMEM;   nOut = {PEN_0, FEN_0, DEN_0, MOP_RD,   RWE_0}; end
        {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_RD,  INDTYPE_X}  : begin nState = STATE_RDMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_RD,   RWE_0}; end
        {STATE_INDMEM,   EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_RD} : begin nState = STATE_RDMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_RDI,  RWE_0}; end
        {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_WR,  INDTYPE_X}  : begin nState = STATE_WRMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_WR,   RWE_0}; end
        {STATE_INDMEM,   EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_WR} : begin nState = STATE_WRMEM;    nOut = {PEN_0, FEN_0, DEN_0, MOP_WR,   RWE_0}; end
        {STATE_ALU,      EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_WRREG;    nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_1}; end
        {STATE_ADDRMEM,  EREADY_X,   ITYPE_X,   MATYPE_REG, INDTYPE_X}  : begin nState = STATE_WRREG;    nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_1}; end
        {STATE_RDMEM,    EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_WRREG;    nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_1}; end
        {STATE_WRMEM,    EREADY_ACT, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_UPDATEPC; nOut = {PEN_1, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
        {STATE_WRREG,    EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_UPDATEPC; nOut = {PEN_1, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
        {STATE_ADDRNPC,  EREADY_X,   ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = STATE_UPDATEPC; nOut = {PEN_1, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
        {STATE_FETCH,    EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
        {STATE_INDMEM,   EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
        {STATE_RDMEM,    EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
        {STATE_WRMEM,    EREADY_INA, ITYPE_X,   MATYPE_X,   INDTYPE_X}  : begin nState = state;          nOut = out;                                    end
        default                                                         : begin nState = STATE_ILLEGAL;  nOut = {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0}; end
        endcase

        // the sequential logic

    always @(negedge clock, posedge reset)
    if (reset)
        begin
            state <= STATE_UPDATEPC;
            out <= {PEN_0, FEN_0, DEN_0, MOP_NONE, RWE_0};
        end
    else
    begin
        state <= nState;
        out <= nOut;
        end;
endmodule[/code]

Decode

Decode.vh

Implementation of the LC-3 instruction set in Verilog, header file Decode.vh: parameter DEN_0 = 1'b0, // Decode enable DEN_1 = 1'b1; parameter [1:0] ITYPE_ALU = 2'b00, // generalized instruction type ITYPE_CTL = 2'b01, ITYPE_MEM = 2'b10, ITYPE_HLT = 2'b11, ITYPE_X = 2'bxx; parameter [1:0] MATYPE_IND = 2'b00, // generalized memory access type MATYPE_RD = 2'b01, MATYPE_WR = 2'b10, MATYPE_REG = 2'b11, MATYPE_X = 2'bxx; parameter INDTYPE_WR = 1'b0, // generalized memory indirection type INDTYPE_RD = 1'b1, INDTYPE_X = 1'bx;

Decode.v

Implementation of the LC-3 instruction set in Verilog, source file Decode.v:

module Decode( input clock,
                input reset,
                input en,                   // input enable
                input [15:0] eDIN,          // external memory data input
                input [2:0] psr,            // processor status register
                output [4:0] cCtrl,         // various control signals
                output [1:0] drSrc,         // selects what to write to DR
                output [2:0] uOp,           // selecta ALU operation
                output aOp,                 // selects Address operation
                output pNext,               // selects if PC should branch
                output [2:0] sr1ID,         // source register 1 ID
                output [2:0] sr2ID,         // source register 2 ID
                output [2:0] drID,          // destination register ID
                output wire [4:0] imm,      // lower 5 bits from IR value
                output wire [8:0] offset ); // lower 9 bits from IR value

    `include "ALU.vh"
    `include "Address.vh"
    `include "MemoryIF.vh"
    `include "DrMux.vh"
    `include "UpdatePC.vh"
    `include "Decode.vh"

    parameter [2:0] ID_X = 3'bxxx;

        // Instruction Register (ir)
        // read instruction from external memory bus (after Fetch initiated the bus cycle)

    reg [15:0] ir;
    assign imm    = ir[4:0];  // output the lower 5 bits
    assign offset = ir[8:0];  // output the lower 9 bits

    always @(posedge clock, posedge reset)
    if (reset)
        ir = 16'hffff;
        else
        if (en == DEN_1)
            ir = eDIN;

    parameter [3:0]           // opcodes for the instructions
        I_BR  = 4'b0000,
        I_ADD = 4'b0001,
        I_LD  = 4'b0010,
        I_ST  = 4'b0011,
        I_AND = 4'b0101,
        I_LDR = 4'b0110,
        I_STR = 4'b0111,
        I_NOT = 4'b1001,
        I_LDI = 4'b1010,
        I_STI = 4'b1011,
        I_JMP = 4'b1100,
        I_LEA = 4'b1110,
        I_HLT = 4'b1111;

    reg [20:0] ctl;           // current control signal bundle

        // untangle control signal bundle

    assign cCtrl = ctl[ 20:16 ];  // { iType, maType, indRd }
    assign uOp   = ctl[ 15:13 ];
    assign aOp   = ctl[    12 ];
    assign drSrc = ctl[ 11:10 ];
    assign pNext = ctl[     9 ];
    assign drID  = ctl[  8: 6 ];
    assign sr1ID = ctl[  5: 3 ];
    assign sr2ID = ctl[  2: 0 ];

        // combinational logic to determine control signals

    wire [2:0] uOpAddC = (ir[5]) ? UOP_ADDIMM : UOP_ADDREG;         // candidate for uOp in case of ADD instruction
    wire [2:0] uOpAndC = (ir[5]) ? UOP_ANDIMM : UOP_ANDREG;         // candidate for uOp in case of AND instruction
    wire pNextC        = |(ir[11:9] & psr) ? PNEXT_TPC : PNEXT_NPC; // candidate for pNext in case of BR instruction

    always @(ir[15:12], uOpAddC, uOpAndC, pNextC)
                    // State      State       State       ALU      Address  DrSource    UpdatePC   Registers RegistersRegisters
    case (ir[15:12])// iType      maType      indType     uOp      aOp      drSrc       pNext      drID      sr1ID    sr2ID
        I_ADD   : ctl = {ITYPE_ALU, MATYPE_X,   INDTYPE_X,  uOpAddC, AOP_X,   DRSRC_ALU,  PNEXT_NPC, ir[11:9], ir[8:6], ir[2:0] };
        I_AND   : ctl = {ITYPE_ALU, MATYPE_X,   INDTYPE_X,  uOpAndC, AOP_X,   DRSRC_ALU,  PNEXT_NPC, ir[11:9], ir[8:6], ir[2:0] };
        I_NOT   : ctl = {ITYPE_ALU, MATYPE_X,   INDTYPE_X,  UOP_NOT, AOP_X,   DRSRC_ALU,  PNEXT_NPC, ir[11:9], ir[8:6], ID_X    };
        I_BR    : ctl = {ITYPE_CTL, MATYPE_X,   INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_X,    pNextC,    ID_X,     ID_X,    ID_X    };
        I_JMP   : ctl = {ITYPE_CTL, MATYPE_X,   INDTYPE_X,  UOP_X,   AOP_SR1, DRSRC_X,    PNEXT_TPC, ID_X,     ir[8:6], ID_X    };
        I_LD    : ctl = {ITYPE_MEM, MATYPE_RD,  INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_MEM,  PNEXT_NPC, ir[11:9], ID_X,    ID_X    };
        I_LDR   : ctl = {ITYPE_MEM, MATYPE_RD,  INDTYPE_X,  UOP_X,   AOP_SR1, DRSRC_MEM,  PNEXT_NPC, ir[11:9], ir[8:6], ID_X    };
        I_LDI   : ctl = {ITYPE_MEM, MATYPE_IND, INDTYPE_RD, UOP_X,   AOP_NPC, DRSRC_MEM,  PNEXT_NPC, ir[11:9], ID_X,    ID_X    };
        I_LEA   : ctl = {ITYPE_MEM, MATYPE_REG, INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_ADDR, PNEXT_NPC, ir[11:9], ID_X,    ID_X    };
        I_ST    : ctl = {ITYPE_MEM, MATYPE_WR,  INDTYPE_X,  UOP_X,   AOP_NPC, DRSRC_X,    PNEXT_NPC, ID_X,     ID_X,    ir[11:9]};
        I_STR   : ctl = {ITYPE_MEM, MATYPE_WR,  INDTYPE_X,  UOP_X,   AOP_SR1, DRSRC_X,    PNEXT_NPC, ID_X,     ir[8:6], ir[11:9]};
        I_STI   : ctl = {ITYPE_MEM, MATYPE_IND, INDTYPE_WR, UOP_X,   AOP_NPC, DRSRC_X,    PNEXT_NPC, ID_X,     ID_X,    ir[11:9]};
        default : ctl = {ITYPE_HLT, MATYPE_X,   INDTYPE_X,  UOP_X,   AOP_X,   DRSRC_X,    PNEXT_X,   ID_X,     ID_X,    ID_X    };
    endcase 

endmodule

UpdatePC

UpdatePC.vh

Implementation of the LC-3 instruction set in Verilog, header file UpdatePC.vh:

parameter PNEXT_NPC  = 1'b0,  // UpdatePC branch signal
            PNEXT_TPC  = 1'b1,
            PNEXT_X    = 1'bx;

parameter PEN_0      = 1'b0,  // UpdatePC enable
            PEN_1      = 1'b1;

UpdatePC.v

Implementation of the LC-3 instruction set in Verilog, source file UpdatePC.v:

module UpdatePC( input clock,
                    input reset,
                    input en,                 // enable signal
                    input [15:0] tPC,         // target program counter
                    input pNext,              // if 1 then branch to tPC
                    output reg [15:0] pc,     // program counter
                    output reg [15:0] nPC );  // next program counter (pc+1)

    `include "UpdatePC.vh"

    wire [15:0] a = (pNext) ? tPC : nPC;     // if pNext==1, then jump to tPC
    wire [15:0] b = (en == PEN_1) ? a : pc;  // change PC only in "Update PC" state
    wire [15:0] c = b + 1'b1;                // use carry input

    always @(posedge clock, posedge reset)
    if (reset)
        begin
            pc  <= 16'h3000;
            nPC <= 16'h3001;
        end
    else
        begin
        pc  <= b;
        nPC <= c;
        end;

endmodule[/code]

Fetch

Fetch.vh

Implementation of the LC-3 instruction set in Verilog, header file Fetch.vh: parameter FEN_0 = 1'b0, // fetch enable FEN_1 = 1'b1;

Fetch.v

Implementation of the LC-3 instruction set in Verilog, source file Fetch.v:

module Fetch( input en,                 // output enable
                input [15:0] pc,          // program counter
                    output reg iBR,           // internal memory address lines
                    output reg [15:0] iADDR,  // internal memory address lines
                output reg iWEA );        // internal memory write enable

    `include "Fetch.vh"

    always @(en, pc)
    begin
        iBR   <= ( en == FEN_1 ) ? 1'b1 : 1'b0;
        iADDR <= ( en == FEN_1 ) ? pc : 16'hxxxx;
        iWEA  <= ( en == FEN_1 ) ? 1'b0 : 1'bx;
    end

endmodule[/code]

Registers

Registers.vh

Implementation of the LC-3 instruction set in Verilog, header file Registers.vh: parameter RWE_0 = 1'b0, // register write enable RWE_1 = 1'b1; parameter [2:0] PSR_POSITIVE = 3'b001, // processor status register bits PSR_ZERO = 3'b010, // should match BR instruction PSR_NEGATIVE = 3'b100;

Registers.v

Implementation of the LC-3 instruction set in Verilog, source file Registers.v:

module Registers( input clock,
                  input reset,
                  input we,                // write enable
                  input [2:0] sr1ID,       // source register 1 ID
                  input [2:0] sr2ID,       // source register 2 ID
                  input [2:0] drID,        // destination register ID
                  input [15:0] dr,         // destination register value
                  output reg [15:0] sr1,   // source register 1 value
                  output reg [15:0] sr2,   // source register 2 value
                  output reg [2:0] psr );  // processor status register

    `include "Registers.vh"

    reg [3:0] id;
    reg [15:0] gpr [0:7];     // general purpose registers

        // write the destination register value, and update Process Status Register (psr)

    always @(posedge clock, posedge reset)
    if (reset)
        for (id = 0; id < 7; id = id + 1)  // initial all registers to 0
        gpr[ id ] <= 16'h0000;
    else
        if (we == RWE_1)          // when enabled by the FSM
            begin
            if (dr[ 15 ])         // update processor status register (neg,zero,pos)
            psr <= PSR_NEGATIVE;
            else if (|dr)
            psr <= PSR_POSITIVE;
            else
            psr <= PSR_ZERO;

                gpr[ drID ] <= dr;     // write the value dr to the register identified by drID
        end

        // output the value of the register identified by "sr1ID" on output "sr1"
        // output the value of the register identified by "sr2ID" on output "sr2"

    always @(sr1ID, sr2ID, gpr[ sr1ID ], gpr[ sr2ID ])
    begin
        sr1 = gpr[ sr1ID ];
        sr2 = gpr[ sr2ID ];
    end

endmodule[/code]

ALU

ALU.vh

parameter [2:0] UOP_ADDREG = 3'b000, // ALU operation UOP_ADDIMM = 3'b001, UOP_ANDREG = 3'b010, UOP_ANDIMM = 3'b011, UOP_NOT = 3'b100, UOP_X = 3'bxxx;

ALU.v

module ALU( input [2:0] uOp,          // operation selector
            input [15:0] sr1,         // source register 1 value (SR1)
            input [15:0] sr2,         // source register 2 value (SR2)
            input [4:0] imm,          // lower 5 bits from instruction register
                output reg [15:0] uOut ); // result of ALU operation

    `include "ALU.vh"

    wire [15:0] imm5 = ({ {11{imm[4]}}, imm[4:0] });  // sign extend to 16 bits

    always @(uOp or sr1 or sr2 or imm5)
    casex (uOp)
        3'b000: uOut = sr1 + sr2;   // ADD Mode 0
        3'b001: uOut = sr1 + imm5;  // ADD Mode 1
        3'b010: uOut = sr1 & sr2;   // AND Mode 0
        3'b011: uOut = sr1 & imm5;  // AND Mode 1
        3'b1xx: uOut = ~(sr1);      // NOT
    endcase
endmodule

Address

Address.vh

parameter AOP_SR1 = 1'b0,  // address operation
          AOP_NPC = 1'b1,
          AOP_X   = 1'bx;

Address.v

module Address( input aOp,                // operation selector
                input [15:0] sr1,         // value source register 1
                input [15:0] nPC,         // next program counter (PC), always PC+1
                input [8:0] offset,       // lower 9 bits from instruction register
                output reg [15:0] aOut ); // target program counter

    `include "Address.vh"

    wire [15:0] offset6 = ({{10{offset[5]}}, offset[5:0]});  // sign extended the 6-bit offset
    wire [15:0] offset9 = ({{ 7{offset[8]}}, offset[8:0]});  // sign extended the 9-bit offset

    always @(aOp or sr1 or nPC or offset6 or offset9)
    case (aOp)
        AOP_SR1 : aOut = sr1 + offset6;  // register + offset
        AOP_NPC : aOut = nPC + offset9;  // next PC  + offset
    endcase

endmodule

MemoryIF

MemoryIF.vh

parameter [2:0] MOP_NONE = 3'b000,  // MemoryIF operation
                MOP_RD   = 3'b100,
                MOP_RDI  = 3'b101,
                MOP_WR   = 3'b110,
                MOP_WRI  = 3'b111;

MemoryIF.v

module MemoryIF( input [2:0] mOp,         // memory operation selector
                 input [15:0] sr2,        // source register 2 value
                 input [15:0] addr,       // address for read or write
                 input [15:0] eDIN,       // external memory data input
                 output reg iBR,          // internal bus request
                 output reg [15:0] iADDR, // internal memory address lines
                 output tri [15:0] eDOUT, // internal memory data output
                 output reg iWEA );       // internal memory write enable

    `include "MemoryIF.vh"

    reg [15:0] eDOUTr;
    assign eDOUT = eDOUTr;

    always @(mOp, sr2, addr, eDIN)
    case (mOp)
        MOP_RD  : begin iBR=1; iWEA = 1'b0; iADDR = addr;     eDOUTr = 16'hzzzz; end
        MOP_RDI : begin iBR=1; iWEA = 1'b0; iADDR = eDIN;     eDOUTr = 16'hzzzz; end
        MOP_WR  : begin iBR=1; iWEA = 1'b1; iADDR = addr;     eDOUTr = sr2;      end
        MOP_WRI : begin iBR=1; iWEA = 1'b1; iADDR = eDIN;     eDOUTr = sr2;      end
        default : begin iBR=0; iWEA = 1'bx; iADDR = 16'hxxxx; eDOUTr = 16'hzzzz; end
    endcase

endmodule

DrMux

DrMux.vh

parameter [1:0] DRSRC_ALU  = 2'b00,  // destination register source selector
                DRSRC_MEM  = 2'b01,
                DRSRC_ADDR = 2'b10,
                DRSRC_X    = 2'bxx;

DrMux.v

module DrMux( input [1:0] drSrc,       // multiplexor selector
              input [15:0] eDIN,       // external memory data input
              input [15:0] addr,       // effective memory address
              input [15:0] uOut,       // result from ALU
              output reg [15:0] dr );  // data that will be stored in DR

    `include "DrMux.vh"

    always @(drSrc or uOut or eDIN or addr)
    case (drSrc)
        DRSRC_ALU  : dr = uOut;
        DRSRC_MEM  : dr = eDIN;
        DRSRC_ADDR : dr = addr;
        default    : dr = 16'hxxxx;
    endcase

endmodule:

BusDriver

BusDriver.v

module BusDriver( input br0,                // input 0, bus request
                    input [15:0] iADDR0,      // input 0, internal memory address lines
                        input iWEA0,              // input 0, internal memory write enable
                    input br1,                // input 1, bus request
                    input [15:0] iADDR1,      // input 1, internal memory address lines
                        input iWEA1,              // input 1, internal memory write enable
                    output tri [15:0] eADDR,  // external memory address lines
                        output tri  eWEA );       // external memory write enable

    assign eWEA  = br1 ? iWEA1 :
                    br0 ? iWEA0 : 1'bz;

    assign eADDR = br1 ? iADDR1 :
                    br0 ? iADDR0 : 16'hzzzz;

endmodule

Functional simulation

The functionality of our microprocessor can be tested by building a test bench. The bench will supply the clock signal and reset pulse and simulate a random access memory (RAM) containing the test program. The program is written using in a assembly language and compiled using LC3Edit.

Test program

Exercises a variety of instructions:

  • memory read
  • alu
  • memory write
  • control instructions

Written in assembly language

own work
LC3 memory asm

LC3Edit compiles this into the object file:

own work
LC3 memory obj

As part of the compilation, LC3Edit also creates a .hex file. The contents of this file can be tweaked into a .coe file to be preloaded in the test bench memory.

own work
LC3 Memory coe

Memory

The random access memory (RAM) is created using Xilinx IP's Block Memory Generator 6.2. The following parameters are used:

  • native i/f, single port ram, no byte write enable, minimum area algorithm, width 16, depth 4096, write first, use ENA pin, no output registers, no RSTA pin, enable all warnings. Initialize memory from the .coe file.

Test bench and clock/reset signals

Generate a 50 MHz symmetric clock.

Integrate the parts into a test bench using Verilog. `timescale 1ns / 1ps module SimpleLC3_SimpleLC3_sch_tb(); reg clock; // clock (generated by test fixture) reg reset; // reset (generated by test fixture) wire [15:0] eADDR; // external address (from LC3 to memory) wire [15:0] eDIN; // external data (from memory to LC3) wire [15:0] eDOUT; // external data (from LC3 to memory) wire eWEA; // external write(~read) enable (from LC3 to memory) // Instantiate the Unit Under Test SimpleLC3 UUT ( .eDOUT(eDOUT), .eWEA(eWEA), .clock(clock), .eREADY(1'b1), // ready (always 1, for now) .eDIN(eDIN), .reset(reset), .eADDR(eADDR) ); // Instantiate the Memory, created using Xilinx IP's Block Memory Generator 6.2: // Initialize from memory.coe, created from compiling memory.asm using LC3Edit. memory RAM( .clka(clock), .ena(eENA), .wea(eWEA), .addra(eADDR[11:0]), .dina(eDOUT), .douta(eDIN) ); wire eENA = |(eADDR[15:12] == 4'h3); // memory is at h3xxx initial begin clock = 0; reset = 0; #15 reset = 1; // wait for global reset to finish #22 reset = 0; end always #10 clock <= ~clock; // 20 ns clock period (50 MHz) endmodule;[/code]

Simulation results

For the functional simulation we use ISim that comes bundled with the Xilinx IDE.

The simulation needs to be ran for 1600 ns.

Waveform diagrams are shown below (click to enlarge)

LC3 Memory load
LC3 Memory load
LC3 ALU operations
LC3 ALU operations
LC3 Memory store
LC3 Memory store
LC3 Control instructions
LC3 Control instructions

Timing simulation

The free Xilinx IDE doesn't support timing simulations. Instead we will use Icarus Verilog for the synthesis and simulation, GTKWave for viewing the generated waveforms, and Emacs verilog-mode for editing. We will run them natively under Linux. For those interested, Windows binaries are available from bleyer.org.

This concludes the "Implementation of the LC-3 instruction set in Verilog".

Design

Presents a CPU Design for LC-3 instruction set, that we later implement using Verilog HDL. The illustrations help visualize the design. The instruction set is based on the book Introduction to Computer Systems by Patt and Partel. For this text we push the simplicity of this little microprocessor (LC-3) even further as described in Instruction Set.

Design

The microprocessor consists of a Data Path and a Control Unit. Together they implement the various instruction phases.

This section describes an architecture for the LC-3. It aims at staying true to the von Neumann architecture and instruction cycle names. However, here we assume the program counter and instruction register are in the data path.

Data Path

The schematic below shows the Data Path.

own work
LC3 data path

We use the following conventions

  • The shaded blocks are modules that implement various functionality. The module names have been chosen to reflect the instruction phases.
  • Signals connect the blocks. A signal can be a single wire, or a collection of wires such as the 16 bits that represent the value of the program counter. Signal names are chosen to overlap with operand names where possible.
  • The microprocessor connects to an external memory through the external interface.

Modules

Module Description
UpdatePC Maintains the program counter, pc.
Fetch Initiates the bus cycle, to read the instruction pointed to by pc.
Decode Reads the instruction from the memory bus and extracts its operands.
Registers Maintains the register values and processor status register.
ALU Performs arithmetic and logical operations.
Address Calculates memory address for memory or control instructions.
MemoryIF Initiates the external memory bus cycle to read or write data.
DrMux Destination register multiplexor, selects the value that will be written to the destination register.
BusDriver Simple arbiter for memory read requests from Fetch and MemoryIF.

Signals

Group Signal Description
Program counters pc Program Counter
nPC Next program counter (always has the value pc+1)
tPC Target program counter, for JMP / BR*.
Operands sr1ID Source register 1 identifier.
Also used as baseRID for JMP / LDR / STR
sr2ID Source register 2 identifier.
Also used as srID for ST / STI / STR.
imm Immediate value
offset Memory address offset
Register values sr1 Value of the register identified by signal sr1ID
sr2 Value of the register identified by sr2ID
dr Value written to the register identified by drID
psr Value of the processor status register
Intermediate values uOut Result of the ALU operation
aOut=addr=tPC Result of the address calculation
External bus eADDR Memory address
eDIN Instruction/data being read from memory
eDOUT Data being written from memory
eWEA Write enable signal going to memory.
Value 0 for read, 1 for write.
Internal bus iBR0, IBR1 Internal bus request signals
iADDR0, iADDR1 Internal memory addresses
iWEA0, iWEA1 Internal write enable signals

Examples

Read memory

Assume: the instruction at address 3000 is 201F.

Assigning the label LDv to memory location 3020, this instruction decodes to

Address Value Label Mnemonic
x3000 x201F LD r0, LDv
x3020 x1234 LDv

Issuing a reset, triggers the following sequence of events:

# Module Action Signals
1. UpdatePC Resets the program counter to its initial value pc=3000, nPC=3001
2. Fetch Starts a read cycle for the instruction br0=1, iADDR0=3000, iWEA0=0
3. BusDriver Forwards the read cycle to the external memory bus eADDR=3000, eWEA=0
4. ExtMemory Responds with the instruction eDIN=201f
5. Decode Extracts the operands offset=1f, drID=0
6. Address Adds the offset to nPC addr=3020
7. MemoryIF Starts a read cycle for the data iBR1=1, iADDR1=3020, iWEA1=0
8. BusDriver Forwards the read cycle to the external memory bus eADDR=3020, eWEA=0
9. ExtMemory Responds with the data eDIN=1234
10. DrMux Selects the eDIN input dr=1234
11. Registers Writes the value dr to the register identified by drID

ALU operation

Assume: pc=3003, the register R0=1234 and R1=4321. The instruction at the next address 3004 is 1801.

This instruction decodes to

Address Value Label Mnemonic
x3004 x1801 ADD R4, R0, R1

The following sequence of events will happen:

.
# Module Action Signals
1. UpdatePC Increments the program counter pc=3004
2. Fetch Starts a read cycle for the instruction iBR0=1, iADDR0=3004, iWEA0=0
3. BusDriver Forwards the read cycle to the external memory bus eADDR=3004, eWEA=0
4. ExtMemory Responds with the instruction eDIN=1801
5. Decode Extracts the operands sr1ID=0, sr2ID=1, drID=4
6. Registers Supplies the values for the registers identified by sr1ID and sr2ID sr1=1234, sr2=4321
7. ALU Calculates the sum of sr1 and sr2 uOut=5555
8. DrMux Selects the uOut input. dr=5555
9. Registers Writes the value dr to the register identified by drID

Write memory

Assume: pc=3007, register R4=AAAA and the label STIa refers to data address 3024 containing the value 3028. The instruction at the next address 3008 is B81D.

This instruction decodes to

Address Value Label Mnemonic
x3008 xB81D STI R4, STIa
x3024 x3028 STIa
x3028 xBAD0

The following sequence of events will happen:

.
# Module Action Signals
1. UpdatePC Increments the program counter pc=3008, nPC=3009
2. Fetch Starts a read cycle for the instruction. iBR0=1, iADDR0=3008, iWEA0=0
3. Bus driver Forwards the read cycle to the external memory bus. eADDR=3008, eWEA=0
4. ExtMemory Responds with the instruction. eDIN=b81f
5. Decode Extracts the operands (sr2ID represents the SR operand) sr2ID=4, offset=1d
6. Registers Supplies the value for the register identified by sr2ID sr2=aaaa
7. Address Adds the offset to nPC addr=3024
8. MemoryIF Starts a read cycle to retrieve the address where to store the value. iBR1=1, iADDR1=3024, iWEA1=0
9. BusDriver Forwards the read cycle to the external memory bus. eADDR=3024, eWEA=0
10. ExtMemory Responds with the value eDIN=3028
11. MemoryIF Starts a write cycle to write the value of register R4 to address 3028 iBR1=1, iADDR1=3028, iWEA1=1
12. BusDriver Forwards the write cycle to the external memory bus. eADDR=3028, eWEA=1

Control unit

Instructions can be broken up into micro instructions. These can be implemented using a finite state machine (FSM), where each state corresponds to one micro instruction.

The finite state machine can be visualized as shown in the figure below.

  • circles, represent the states identified by a unique number and name.
  • double circle, represents the initial state.
  • arrows, represent state transitions. Labels represent the condition that must be met for the transition to occur.
  • shading, is used to identify the implementation modules.
  • eREADY, indicates that the external memory finished a read or write operation.
  • iType, maType, indType refer to the generalized instruction types generated by Decoder.

State diagram

own work
LC3 finite state machine

Details

  • Policies:
    • State transitions, are only possible during the falling edge of the clock signal (from 1 to 0);
    • Outputs, to the external memory interface, are driven in response to state transitions;
    • Inputs, from the external memory interface, are sampled on the rising edge of the clock signal (from 0 to 1);
    • Control signals, change only during the falling edge of the clock signal to minimize glitches.
  • Each state:
    • depends on both input signals and the previous state’
    • generates control signals control signals for the data path (with the help of the Decode module).
  • The control unit consists of two modules:
    • State, implements the state machine, and generates state specific control signals.
    • Decode, generalizes the instruction for the state machine, and generates state independent control signals.

Schematic for the control unit

own work
LC3 control unit

The next section describes the signals for the control unit in the CPU Design for LC-3 instruction set.

Signals for the control unit

Group Signal Description
External interface eREADY==1 Indicates that the external memory finished a read or write operation.
clock External supplied clock
Internal to the State module state Current state
nState Next state as determined by the combinational logic
Generalized instruction types (bundled into cCtrl) iType Instruction type
maType Memory access type
indType Indirect memory access type
Data path control pNext Signals UpdatePC to change the program counter to tPC
pEn Enables UpdatePC to change the program counter.
fEn Enable Fetch to start external memory bus cycle to read the instruction.
dEn Enables Decode to read the instruction from the external memory bus.
rWe Enables Registers to store the value of dr in the register identified by drID.
uOp Chooses the operation and inputs of the ALU
aOp Chooses the operation and inputs of the Address calculation.
mOp Chooses the memory operation to be performed by MemoryIF.
drSrc Selects the destination register source input on DrMux

The next section gives a detailed description of the modules for the CPU Design for LC-3 instruction set.

Modules (detailed description)

Module Description
State Generates the state specific control signals for each micro instruction being executed. Refer to the signals described above for details.
UpdatePC Updates the program counter, pc, at the end of each instruction cycle. The new value is:
  • 3000 if reset is asserted, or
  • the value of the tPC input, when a JMP or BR* instruction was executed (and its condition was met), or
  • otherwise, the previous value of pc+1.
Fetch
  • Initiates the external bus cycle (iBR0, iADDR0, iWEA0) to read the instruction from the memory location pointed to by pc.
  • The control unit will maintain this state until the external memory reports that the data is available (eREADY).
Decode
  • Finishes the external bus cycle by reading the instruction from the external memory bus (eDIN).
  • Decodes the instruction:
    • Based on the opcode (ir[15:11], it generalizes the instruction type for the State module (cCtrl).
    • Based on the operands (ir[10:0]), it configures the data path using state independent control signals:
      • For ALU instructions, uOp, sr1ID, sr2ID, drSrc, drID
      • For memory instructions, aOp, sr1ID (BaseR for LDR / STR), sr2ID (sr for ST / STI / STR), offset, drID
      • For control instructions, pNext, sr1 (BaseR for JMP).
Registers
  • Maintains the general purpose register (R0..R7).
  • Supplies the values for the registers identified by sr1ID, sr2ID.
  • Updates the register specified by drID to the value dr when rWe is asserted.
ALU
  • The input uOp selects both the operation type and inputs.
    • For ADD do if ir[5]==1 then uOut=sr1+imm5, else uOut=sr1+sr2.
    • For AND do if ir[5]==1 then uOut=sr1&imm5, else uOut=sr1&sr2.
    • For NOT do uOut=~sr1.
Address
  • Input aOp selects both the calculation type and inputs.
    • For BR* do aOut=nPC+offset9.
    • For LD / LDI / LEA / ST / STI, do aOut=nPC+offset9.
    • For JMP / LDR / STR, do aOut=sr1+offset6.
  • Note that aOut is connected to the addr input on MemoryIF, and the tPC input on FetchPC.
MemoryIF
  • Input mOp selects the memory access mode and inputs.
    • For LDI / STI, under the direction of the Control Unit, it first initiates a memory read cycle for addr (aOut). The Control Unit will maintain this state until the external memory reports that the data is available (eREADY). It then takes the value read from memory (eDIN), and
      • for LDI, it initiates a read cycle for address eDIN;
      • for STI, it initiates a write cycle to write the value sr2 to address eDIN.
    • For the other instructions, it takes the address, and
      • for LD / LDR, read the value from addr (aOut);
      • for LEA, do nothing;
      • for ST / STR, write the value sr2 to addr (aOut).
    • The control unit will maintain this state until the external memory reports that the data is available (eREADY).
DrMux*
  • Input drSrc selects the value that will be written to the destination register.
    • For ADD / AND / NOT do forward uOut to dr.
    • For LD / LDR / LDI do forward deign to dr.
    • For LEA do forward aOut to dr.

*) DrMux is an abbreviation for Destination Register Multiplexor.

To continue this CPU Design for LC-3 instruction set, read about its implementation on the next page.

Instruction set

Introduces a simplified LC-3 instruction set, that we later will design a CPU for and implement in Verilog HDL.

Instruction Set

The Instruction Set Architecture (ISA) specifies all the information to write a program in machine language. It contains:

  • Memory organization, specifies the address maps; how many bits per location;
  • Register set, specifies the size of the internal registers; how many registers; and how they can be used;
  • Instruction set, specifies the opcodes; operands; data types; and addressing modes

Simplicity rules

The book Introduction to Computer Systems by Patt and Partel, introduces an hypothetical microprocessor called LC-3. For this text we push the simplicity of this little computer (LC-3) even further by:

  • not supporting subroutine calls, JSR JSRR RET
  • not supporting interrupt handling, RTI TRAP
  • not supporting overflow detection in arithmetic operations
  • not validating the Instruction encoding
  • replacing the TRAP 0, with a simple HALT instruction.

Implementing this very basic Instruction Set helps us understand the inner workings of a microprocessor.

With the exception of these simplifications, the Instruction Set Architecture (ISA) is specified in the book “Introduction to Computer Systems“. The following sections summarize this ISA. For more details, refer to Appendix A.3 of the book.

Overview

  • Memory organization:
    • 16-bit addresses; word addressable only,
    • 16-bit memory words.
  • Memory map
    • User programs start at memory location 3000 hex, and may extend to FDFF.
  • Bit numbering
    • Bits are numbered from right (least significant bit) to left (most significant bit), starting with bit 0.
  • Registers
    • A 16-bit program counter (PC), contains the address of the next instruction.
    • Eight 16-bit general purpose registers, numbered 000 .. 111 binary, for register R0 .. R7.
    • A 3-bit processor status register (PSR), that is updated when an instructions writes to a register.
      • psr[2]==1, when the 2’s complement value is negative (n).
      • psr[1]==1, when the 2’s complement value is zero (z).
      • psr[0]==1, when the 2’s complement value is positive (p).
  • Instructions
    • 16-bit instructions, RISC (all instructions the same size).
      • the opcode, is encoded in the the 4 most significant bits of the instruction (bit 15..12).
      • the operands, are encoded in the remaining 12 bits of the instruction.
    • ALU performs ADD AND and NOT operations on 16-bit words.

Instructions

Operand conventions

As mentioned above, from the 16 bit instruction, only 12 bits are available for the operands. This implies that 16-bit data values or memory addresses have to be specified indirectly. For instance by referring to a value in a register.

Addressing modes:

  • PC relative, the address is calculated by adding an offset to the incremented program counter, pc.
  • Register relative, address is read from a register.
  • Indirect, address is read from a memory location who”s address is calculated by adding an offset to the incremented program counter.
  • Load effective address, address is calculated by adding an offset to the incremented program counter. The address itself (not its value) is stored in a register.

The table below shows the conventions used in describing the instructions.

Operand Description
srID, sr1ID, sr2ID Source Register Identifiers (000..111 for R0..R7)
drID Destination Register Identifier (000..111 for R0..R7)
baseRID Base Register Identifier (000..111 for R0..R7)
sr, sr1, sr2 16-bit Source Register value
dr 16-bit Destination Register value
baseR Base Register value, used together with 2’s complement offset to calculate memory address.
imm5 5-bit immediate value as 2’s complement integer
mem[address] Contents of memory at the given address
offset6 6-bit value as 2’s complement integer
offset9 9-bit value as 2’s complement integer
SX Sign-extend, by replicating the most significant bit as many times as necessary to extend to the word size of 16 bits.
Conventions

ALU instructions

There are two variations of the ADD and AND instructions. The difference is in bit 5 of the instruction word. One takes the second argument from sr2, the other takes it from the immediate value imm5.

Instruction types

Opcode Name Assembly Operation
ADD Addition ADD DR, SR1, SR2 dr = sr1 + sr2
ADD DR, SR1, imm5 dr = sr1 + SX(imm5)
AND Logical AND AND DR, SR1, SR2 dr = sr1 & sr2
AND DR, SR1, imm5 dr = sr1 & SX(imm5)
NOT Logical NOT NOT DR, SR dr = ~sr

Instruction encoding

Opcode 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ADD 0 0 0 1 drID sr1ID 0 0 0 sr2ID
0 0 0 1 drID sr1ID 1 imm5
AND 0 1 0 1 drID sr1ID 0 0 0 sr2ID
0 1 0 1 drID sr1ID 1 imm5
NOT 1 0 0 1 drID srID 1 1 1 1 1 1

Memory instructions

Instruction types

Opcode Name Assembly Operation
LD Load LD DR, label dr = mem[pc + SX(offset9)]
LDR Load Register LDR DR, BaseR, offset6 dr = mem[baseR + SX(offset6)]
LDI Load Indirect LDI DR, label dr = mem[mem[pc + SX(offset9)]]
LEA Load Eff. Addr. LEA DR, target dr = pc + SX(offset9)
ST Store ST SR, label mem[pc + SX(offset9)] = sr
STR Store Register STR SR, BaseR, offset6 mem[baseR + SX(offset6)] = sr
STI Store Indirect STI SR, label mem[mem[pc + SX(offset9)]] = sr

Instruction encoding

opcode 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LD 0 0 1 0 drID offset9
LDR 0 1 1 0 drID baseRID offset6
LDI 1 0 1 0 drID offset9
LEA 1 1 1 0 drID offset9
ST 0 0 1 1 srID offset9
STR 0 1 1 1 srID baseRID offset6
STI 1 0 1 1 srID offset9

Control instructions

Instruction types

Opcode Name Assembly Operation
BR* Branch BR* label if (condition*) pc = pc + SX(offset9)
JMP Jump JMP BaseR pc = baseR
HALT Halt HALT stop program execution (simplified TRAP 0)

*) The assembler instruction for BR* can be either

  • BRn label, test for state bit n
  • BRz label, test for state bit z
  • BRn label, test for state bit p
  • BRzp label, test for state bits z and p
  • BRnp label, test for state bits n and p
  • BRnz label, test for state bits n and z
  • BRnzp label, test for state bits n, z and p

Instruction encoding

opcode 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
BR* 0 0 0 0 n z p offset9
JMP 1 1 0 0 0 0 0 baseRID 0 0 0 0 0 0
HALT 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1

This article “Simplified LC-3 Instruction set” continues with Design on the next page.

Architecture

Explains how a CPU works by implementing the LC-3 instruction set. Includes an in-depth look at the instruction cycle phases. The inquiry “How do computers do math introduced the components needed to build a microprocessor. This article series continues by introducing the microprocessor. It uses a top-down approach.

We use a top down approach to help break up the complex microprocessor into simple, more manageable parts. Starting from the architecture, it dives down to an instruction set. In the second half we build a microprocessor using a field programmable gate array.

We assume a familiarity with assembly code.

Credits

This text leans on chapter 4 from the excellent book “Introduction to Computer Systems” by Patt and Partel. The implementation borrows from Davis’ Project#1 description at NC State University.

Architecture

World War II bought widespread destruction, but also spurred a flurry of computer innovations. The first electric computer was built using electro-mechanical relays in Nazi Germany (1941). Two years later the US army built ENIAC using 18,000 vacuum tubes for calculating artillery firing tables and simulating the H bomb.

Glenn A. Beck and Betty Snyder program ENIAC
Source: US Army photo, PD

This computer could perform complex sequences of operations, including loops, branches and subroutines. The program in this computer was hardwired using switches and dials. It was a thousand times faster as the electro-mechanical machine, but it took great effort to change the program.

The programming was hard-wired into their design, meaning that “reprogramming” a computer simply wasn’t possible: Instead, computers would have to be physically disassembled and redesigned. To explain how a CPU works, we take a look at the leading architecture.

von Neuman Architecture

Src: Encyclopædia
Britannica
In 1945, the mathematician John von Neumann formalized processor methods developed at the University of Pennsylvania. His computer architecture design consists of a Control Unit, Arithmetic and Logic Unit (ALU), Memory Unit, Registers and Inputs/Outputs. These methods became known as the von Neumann architecture and still forms the foundation for today’s computers.

Using the von Neumann architecture, computers were able to be modified and programmed via the input of instructions in computer code. This way, the functionality could be simply rewritten using a programming language.

The von Neuman Architecture is based on the principle of:

  1. Fetch an instruction from memory
  2. Decode the instruction
  3. Execute the Instruction
This process is repeated indefinitely, and is known as the fetch-decode-execute cycle.

The Central Processing Unit (CPU) is the electronic circuit responsible for executing the instructions of a computer program. It is also referred to as the microprocessor. The CPU contains the Control Unit, ALU and Registers. The Control Unit interprets program instructions and orchestrates the execution of these instructions. Registers store data before it can be processed. The ALU carries out arithmetic and logical operations.

von Neumann architecture

Instructions

Instructions are a fundamental unit of work that are executed completely, or not at all. In memory, the instructions look just like data — a collection of bits. They are just interpreted differently.

An instructions includes:

  • an opcode, the operation to be performed;
  • operands, the data (locations) to be used in the operation.

There are three instruction types:

  • arithmetic and logical instructions, such as addition and subtraction, or logical operations such as AND, OR and NOT;
  • memory access instructions, such as load and store;
  • control instructions, that may change the address in the program counter, permitting loops or conditional branches.

This article “How a CPU works” continues with Instruction Set on the next page.

Copyright © 1996-2022 Coert Vonk, All Rights Reserved