version : 2009-01-18
Back to the main page


Overview of the instruction set



Warning : update in progress


The instruction format

YASEP is a RISC architecture, so all the instructions have the same fixed structure, while at the same time they try to be generic and orthogonal. This keeps their number low and makes them easy to implement.

YASEP has just one memory addressing mode : contrary to most RISC architectures (load-store architectures), YASEP accesses memory through its registers : it's a register-mapped memory architecture that further reduces the total number of instructions.

In the end, most instructions deal only with registers and immediate data. So all the instructions are formed by a simple 8-bit opcode, followed by two 4-bit register fields ("src1/dest" and "src2"), and an optional 16-bit immediate field (called "Imm16").

The YASEP's instruction structure : add/sub with skip on carry

Most instructions are available with both 16-bit and 32-bit "forms" (with or without Imm16), they only differ by the value of a single bit, that tells the decoder how to behave.

  • With the 16-bit form, the core is "2-address machine", where one address points to both one source and the destination (think x86...).
  • When the "immediate flag" is set, the instruction is 32-bit wide and a 3rd immediate operand is provided, so the 2 register addresses point to one source and one destination.
  • YASEP32 and YASEP16 only differ in the width of the datapath, while they share the same instruction format and most opcodes.

    YASEP's focus is on simplicity and ease of implementation : the instructions are not the most efficient, compact or flexible. There is some room for later improvements but feature creep is currently considered harmful for the project's progress.

    The opcode map shows that the opcodes are grouped into 16 groups of 8 (or less) closely related functions. Currently defined groups are :

  • CTL (control),
  • MOV
  • ASU (Add and Substract Unit, with optional skip on carry/borrow),
  • ROP2 (boolean operations),
  • SHL (bit SHuffLing, rotate and Shift),
  • IE (Insert/Extract bytes (8 bits) and half-words (16 bits) into/from words (32 bits)),
  • MISC (various operations).
    There are 4 groups of conditional instructions with similar behaviour and identical conditions :
  • CMOV (Conditional MOVe),
  • JMP (conditional JuMP),
  • SKIP (conditional SKIP),
  • Q (conditional switch to another Queue).
    The last group is RSV (ReSerVed). Some groups are also assigned to future uses : SMT and PFQ.


    Overview of the instruction set


    The YASEP can do things other processors can do. But it's done a bit differently.


    The register map

    First thing to know : the CQ's value. It is inehrent, implicit. It tells you which queue you're running on. Usually, when you start, it is set to 0 so you have (generally) Q1 to Q5 available for other purposes.

    Without careful planning and a smart resource allocation, the queues will easily be saturated, leaving no room for further execution. But hey, YASEP is a "small" microcontroller and if there was idle or unused, wasted resources, they would be removed ASAP. So bear with the tight constraints and remember that some people have written webservers on much, much, much more reduced processors.

    So the register allocation is usually like this :

    0A0Queue #0's address register, default instruction pointerExecutable
    1D0Queue #0's data register, default current instruction
    2A1Queue #1's address register
    3D1Queue #1's data register
    4A2Queue #2's address register
    5D2Queue #2's data register
    6A3Queue #3's address register
    7D3Queue #3's data register
    8A4Queue #4's address register, stack pointerData-only
    9D4Queue #4's data register, stack top
    10A5Queue #5's address register, alternate stack pointer
    11D5Queue #5's data register, alternate stack top
    12R0Standard register #0Static
    13R1Standard register #1
    14R2Standard register #2
    15R3Standard register #3

    Having two programmable stacks makes it easy to implement the basic FORTH functions, though it was not the original intention. The first YASEP (VSP) draft included only 4 queues but that was certainly a limitation, so it was increased to 5 queues with the fifth being able to act as a stack.

    However static registers are not the most precious resource in a microcontroller, particularly when a queue can emulate one static register (you have to point the address register to a scratch area). Complex or heavy algorithms could even virtually have infinite numbers of registers when playing with the autoincrement features. So lately i decided to put a sixth queue.


    Basic instructions

    Let's start easy. Computational examples here use the static registers so there is no side-effect. Furthermore, the operations use both the short and long instruction forms without raising much issues.

    You can familiarize yourself and practice with the interactive Execution Units testbench or interactive assembler, and better understand the issues raised by the operand orders.

    Note : The assembler's syntax hides the little details of the operand orders. The YASEP's instruction format is quite rigid, with some small "enhancements", but the assembly-level instructions have one consistent rule : the destination register is the last in the line. This is covered in the assembler's documentation.

    The ASU

    The first execution unit is ASU : it is the "Add and Substract Unit".

    add r0 r1;      adds r0 and r1, then puts the result in r1
    add r0 123 r1;  adds r0 and immediate data 123, then puts the result in r1
    Is that simple enough ?

    Now if you want to perform multi-precision computations, the YASEP has no carry flag nor dual-write capability. I used a special instruction to create the carry in a register but it was not very clever in a 2-address processor. So I recently (4/2007) changed this system to a simpler and more powerful one : the instruction skips 0 to 3 half-words

     ; Addition of R2 to the 64-bit value R0:R1
    adds2 r2 r0  ; r0 = r0+r2
     ; The next instruction is skipped if no carry was generated
    add 1 r1     ; carry : r1 = r1+1 (long form : 2 half-words)

    The same works for substraction too. The SUB instruction has 3 "skipping" versions that operate the same way. All the available opcodes are summarized below :

    ADD      ADDition
    ADDS1    ADD and Skip 1 half-word if carry
    ADDS2    ADD and Skip 2 half-word if carry
    SUB      SUBstract
    SUBS1    SUBstract and Skip 1 half-word if carry
    SUBS2    SUBstract and Skip 2 half-word if carry

    The ROP2

    This unit is called "ROP2" because it performs all useful Raster OPerations with 2 operands. 16 possibilities exist but several are congruent so 8 boolean operations are available :


    Because of the core's limitations, there is no MUX instruction (unlike F-CPU, it must be emulated with three instructions and a couple of temporary registers).

    Note : in the OPN operations (ANDN, and ORN), the source that is inverted is the first one, not the same address as the destination nor the immediate field (when any). This should make it easier to code real stuff. Notice how it impacts the order of the operands :

    ANDN r1 r2     ;  r2 =  r2 & ~r1
    ANDN r1 123 r2 ;  r2 = 123 & ~r1
    ORN  r1 r2     ;  r2 =  r2 | ~r1
    ORN  r1 123 r2 ;  r2 = 123 | ~r1

    The SHL

    The "shuffle" unit (SHL) moves bits around the register. It's a very stripped-down version of the corresponding F-CPU unit and it does only rotation and shifts on 32-bit data. These are the 5 basic, unavoidable operations :

    SHR  logic SHift Right
    SAR  SHift Arithmetic Right
    SHL  logic SHift Left
    ROL  ROtate Left
    ROR  ROtate Right

    Note : The ROR and ROL instructions are congruent, but having a single ROT instruction creates an ambiguous problem : should ROT be ROR or ROL of the first 15 positions ? The assembler can emulate ROL with ROR (and vice versa) by negating the immediate operand. However, this is more complex problem with the "shift amount" is given by a register (this impacts the algorithm).

    At the binary level, the order of the operands reflects the architectural constraints. In order to keep things useful and practical, the assembler hides these details (remember that the destination/result register is always the last operand). Here comes an example of the syntax :

    SHR r1 r2;      r2 = r2 >> r1
    SHR r1 12 r2;   r2 = r1 >> 12 (note that the immediate
                      could be larger but only the 5 LSB are used)

    The IE unit

    The YASEP contains no load/store unit and treats only 32-bit words. The Insert/Extract unit eases access to 8-bit and 16-bit quantities by shifting words apropriately, with a direct communication with the PFQ's pointers. It is thus possible to have the equivalent of "load" and "store" operations, with the added benefits of pointers that are auto-incremented with the right values. And maybe more in the future.

    LSB  Load Sign-extended Byte (and inc ptr)
    LZB  Load Zero-extended Byte (and inc ptr)
    LSH  Load Sign-extended 16-bit Half-word (and inc ptr)
    LZH  Load Zero-extended 16-bit Half-word (and inc ptr)
    SB   Store Byte (and inc ptr)
    SH   Store 16-bit Half-word (and inc ptr)
    SHH  Store 16-bit Half-word High (shift it and ignore the pointer)
    MOV  Copy the register or the sign-extended Imm16 field to the destination register.

    The shift unit can load bytes from any position. However, because the unit can't cross word boundaries, it can't shift 16-bit words to any position (only offsets 0, 1 and 2 are possible). A trap should be triggered if a pointer offset 3 is found.

    The MOV instruction was moved here because the IE instructions are the most similar.

    "Load" specifics :

    These operations extract one byte or one "word" from a given register. The data is shifted right, according to the implicit pointer associated to this register. If the register is a static register, then the offset is zero. If this is the Data register of a PFQ, the 2 LSBs of the associated pointer are used as offset.

    D4 = 12345678h
    A4 = 00001BADh  (offset : A4 & 3 = 1)
    LSB D4 R1  => R1=56h, A4+=1
    now : A4 = 00001BAEh  (offset : A4 & 3 = 2)
    LZB D4 R2  => R2=34h, A4+=1

    Note that the long instruction form is not used, because the added immediate is useless. It may be used in the future to extend the offset, using the ASU in parallel to compute a new pointer. But it's too early now.

    Note also that the pointer in question is only the 2 LSB of said pointer. When an overflow occurs, the PFQ hardware will increment its own counters to provide the next/previous word from memory. And the pointer's inc/dec flags must also be taken into account...

    "Store" specifics :

    This is quite similar to the "load" instructions, except that here, the immediate field makes sense in the long instruction form. But then, we have an excess register, or (the other way around) we can't use the extended pointer increment. The chosen approach (today) uses the same form as the load, and either the 2nd operand is used for the stored data (in the short/RR form), or else the immediate field is used (but then the 2nd operand is left unused, so it's not written).

    ; D4 = 12345678h
    ; A4 = 00001BADh  (offset : A4 & 3 = 1)
    ; R1 = 9ABCDEF0h
    SB R1 D4;  => D4=1234F078h, A4+=1
    ; now : A4 = 00001BAEh  (offset : A4 & 3 = 2)
    SH R1 D4;  => D4=DEF0F078h, A4+=2
    ; now : A4 = 00001BB0h  (offset : A4 & 3 = 0), a new word is loaded in D4=89ABCDEFh
    SB 0123h D4;  => D4=89ABCD23h, A4+=1
    ; now : A4 = 00001BB1h  (offset : A4 & 3 = 1)
    SH 4567h D4;  => D4=89456723h, A4+=2

    The "Store Half-word High" instruction is derived from "Store Half-word", but without checking/using the pointer : it unconditionally shifts the imm/reg's LSB by 16 bits to replace the destination's MSB. The main use is for loading 32-bit immediate data into a register, when preceded by a simple "SH"

    SH  5678h R1; => R1=00005678h
    SHH 1234h R1; => R1=12345678h

    Note that SH must come before SHH because SH sign-extends Imm16. The MSB must be corrected by the following SHH :

    SH  89ABh R1; => R1=FFFF89ABh (constant is sign-extended)
    SHH CDEFh R1; => R1=CDEF89ABh

    Misc. functions

    Not all opcodes are used, according to the above descriptions. With some little added HW, it is possible to perform several other operations :

    EXPND  the 4 LSB of the first operand are "expanded"
              to 4 byte masks (0 or FFh) and ANDed to the 2nd operand
    MATCH  each byte in src1 is checked for equality
              with the corresponding byte in src2, creating a bitfield.
    BMASK  each byte in src1 is checked for equality
              with the corresponding byte in src2, creating a bytemask.
    BSWAP  reverse the word's endian

    Because the YASEP is meant to manage byte streams, it must be able to scan through them. A specific operation is provided that detects byte patterns : the MATCH instruction XORs both operands and ANDs the resulting 4 bytes, generating a 4-bit field. This can be used in the detection of byte patterns, the loop running while the result is zero. When it becomes non-zero, the bitfield is useful as an index for computed jumps or calls, to functions that deal with alignment for example.

    BMASK is similar but creates a byte mask, instead of a bit field. Like the previous instruction, the immediate field can be used as input for the XOR.

    EXPND "expands" the 4 LSBs of the first operand to create a byte mask, too. The result is ANDed to the 2nd operand (register or immediate) to add some flexibility. This is the kind of operation that is useful when doing bitmap graphics, like writing a bitmap font to a byte-map raster, for example

    Do i need to explain why BSWAP is useful ? The YASEP is a little-endian machine and might appreciate communication with other "kinds" of computers. Note that the immediate field is useless in this instruction, so it is simply ignored.

    Conditional Jump/Skip/Move instructions

    Four groups of instructions provide the developper with different granularities of instruction flow control.

    The YASEP's instruction structure : conditional instructions

    The operands and fields are :

    The different combinations create the 28 following opcodes :


    Notes :


    To Be Determined

    Not all the instructions are defined or used currently. Some room is needed for later, and it's not a good idea to fill more than 75% of the opcode space in the first iteration of a CPU architecture.

    Here are some intended uses for the remaining opcodes, but the future may prove these forecasts all wrong. Before they are defined, all the opcodes of these groups are marked "TBD" and will behave like INV.

    The BIT instructions

    These instructions perform bit test, toggle, set and clear on a word. The location of the words is not clear yet (memory, register, SR or something else).

    The SMT instructions

    This group of instructions controls the Simultaneous MultiThreading functions of YASEP. For example, they can create or destroy a thread, read or modify their thread ID, or modify thread properties.

    These instructions are optional and TBD later, when YASEP will work correctly in single-thread mode.

    The PFQ instructions

    These instructions control the prefetch queues, like the auto-update bits of the pointers or the caching strategy.

    TBD, it is useless in the current definition of the code (the memory interface is not yet designed).

    The ReSerVed opcodes

    The last group is clrearly left untouched, except for the INV instruction, which explicitly triggers an invalid opcode exception (for the current and all the future releases). There is no intention to use this group in the not-too-distant future.



    More informations (older and written differently) can be found in this text