vspsim/doc/assembly.html version 2007-04-17

The VSP assembly language

Introduction

This page describes the VSP's JavaScript assembler and the langage conventions necessary for writing software.

Architecture

The VSP assembler is split into two parts :

  • The line-by-line, or instruction-level assembler is a stateless routine that assembles one instruction at a time. It can be accessed through the asm.html interface or with the floating window (when you click on instructions in HTML files, like add d2, r3). It is not very sophisticated but very useful in countless places.

  • The high-level assembler takes a whole file, plits it into single lines and feeds them to the above routine. It does not deal with the instructions directly but assembles the results, manages the symbol tables, evaluates/computes values, create the binaries, handles preprocessing... It will be described and developped later.

    Instruction-level assembly

    The VSP instructions are very simple but they are not always practical, so some opcodes introduce a few modifications. The are usually harmless and don't impact the architecture, but they make the instructions more handy, thus helping the instruction stream to be more efficient.

    The assembly langage's goal is to hide these architectural details from the mind of the software developper. Thus comes the first rule : the destination register is the last in the line.

    Input processing

    and that's all there is to say about the subject of "input syntax".

    Number formatting

    The numbers are always output in hexadecimal and are accepted in 3 formats :

    Numbers are mostly used in contexts where the number of significant bits is bounded to the size of the container (usually, the optional immediate field). When more bits are input, the assembler keeps only the desired number of LSB, and discards the MSB. The following example shows how the number is truncated : db 1234h.

    Pseudo-instructions

    The ability to output arbitrary numbers is critical for many uses so the assembler has the following three pseudo-instructions :

    An unbounded number of litteral numbers is accepted (since 2007-04-04 and the limit depends on the JavaScript engine, not on the assembler's design). The floating assembler's window will not display all the digits when more than 32 bits are given but they are available through software (by defining the emit_bin() function).

    Example : db 12h 34h 56h 78h 9Ah BCh DEFh
    emit_bin()'s output :

    Reserved symbols

    The instruction-level assembler recognizes the following symbols, and rejects anything else :

    Aliases

    There are two types of aliases : form aliases (see FLAG_ALIAS_RR) and instruction aliases (they are listed under the opcode map). This is what we talk about here.

    Internally, they can be used like normal instructions, but they provide different forms and/or different semantics. However, they use real opcodes of other instructions. The substitution is handled at the assembly level and the disassembly probably won't infer the originally assembled alias. So don't be surprised if instructions like NOT or NEG assemble correctly, but the disassembly returns a different opcode.

    Instruction Forms

    Despite the very simple instruction format, the assembly language instructions appear with several different forms. This is due usually to reasons like :

  • The instruction does not make sense with one form, for example BSWAP does not need an immediate field.
  • The assembly language must be easy to understand and some fields are written in a different order internally.
  • The forms can be decomposed into three main groups :

  • The binary instructions have 2 forms (long and short) but when one counts the combinations when useless fields are removed, there are 2×3=6 possible assembly langage forms : three variations (2, 1 or no register) of an instruction with (ALONE, R, RR) or without (I, IR and RIR) immediate data. The 7th form (RI) is a special case of IR.
  • The Jump and Skip instructions add 1 more form (IRI), while 2 are congruent with existing forms (I and IR). The disassembler distinguishes between Imm16 and other fields with the help of additional flags and attributes.
  • Finally, the Imm16 field is sometimes ignored, which gives 5 more forms : (X, IX, RX, RRX, and IRX)
  • All these forms may seem complex (at first sight) but they make the VSP's assembly langage source code easy to read and write, by sticking more to the semantic of the instruction than on the instruction's binary structure.

    The availability of all these forms is also a compromise between flexibility of implementation and completeness of description (of both the assembler and disassembler). The current system allows new forms and syntaxes to be added or removed without changing the whole structure, thus making development and experimentation faster.

    RR Form : "Register-Register"

    This is the most usual form, with only two registers used, for example add d2, r3.

    Note :
    With mov r1, d4, the first operand (r1 is encoded in the src2 field) is read and the value is stored in the last operand (d4 is encoded in the src1/dest field). src1/dest is maybe speculatively read but in this case, the value is not used, so it does not result in the auto-update of the pointer associated with d4 (a4) as a read. But this is not an issue anyway because the write auto-update takes precedence over the read auto-update.

    R : "one Register only"

    Some instructions (CLR and JMP) need only one operand (which is either a source or destination register).

    This is also a syntax shortcut of RR for some instructions, when one wants both source and destination to be the same register. For example, bswap r1 will encode R1 in both the src1/dest and src2 fields of the instruction. In this case, the opcode has the FLAG_ALIAS_RR attribute.

    ALONE : no parameter

    Some instructions don't need any parameter, for example nop or inv. This form encodes to a short instruction because there is no immediate data. Use the I or X forms to force a long instruction (depending on the opcode and if the value of the immediate field is ignored or not).

    RIR Form : "Register and Immediate to Register"

    This form is used when immediate data are included in the instruction stream like this : add d2, 234, r3.

    The difference with RR is that the second source operand (src1/dest) is replaced by the immediate field, so the src1/dest field is only the address of a written register.

    IR : "Immediate to Register"

    As noted above, some instructions do not make sense with more than one register operand or field. The register might be used as both source (depending...) and (always) destination (the immediate field must be stored somewhere).
    For example,
    mov 2, d3

    In this case, the src2 field is ignored.

    This IR form also means "Immediate if Register" for the conditional instructions.

    For example, the instruction so 3, a2 will skip 3 half-words if register A2 is Odd (LSB==1, or unaligned). This form can be extended with a 16-bit immediate field in the IRI and IRX forms (depending on whether the Imm16 value is used or not).

    RI : "Register to Immediate"

    This is technically the same thing as the previous IR form. It is only needed by the PUT instruction, because the "destination" is the Special Register whose number is given as an immediate 16-bit number (so it must come last in the instruction).
    For example, put d3 2754

    I : "Immediate"

    Some instructions could need only one long immediate parameter, or must be writable in "long" form. However, there is no such instruction yet, most of them simply ignore the Imm16 field (see the X form).

    The "I" form is also used by the inconditional SKIP and Q instructions. This is the equivalent of the above IR form, but without register field. The 2-bit immediate number represents

    For example, the instruction skip 4 will skip 4 half-words (two words).

    IRI : "Immediate if Register xor Immediate"

    This is the Imm16 extension of the above IR form, used by some conditional skip instructions (SZ/SNZ) and conditional jump instructions (QZ/QNZ). The other conditional instructions (SO/SNE, SNO/SE, SS, SNS, QO/QNE, QNO/QE, QS and QNS) make no sense with the immmediate 16-bit field because the condition is already XORed by the negation field.
    For example, the instruction qz 2, a4 21 will switch to queue #2 if register A4 is equal to 21.

    Forms that ignore the Imm16 field

    Several instructions don't make sense with the optional imm16 field. However, the form could be useful for padding purpose, for example. The "-X" forms use the question mark ("?") to indicate that the long version of the instruction is desired, and the assembler will fill the remaining bits with adapted values (see below).

    X : "ignore Imm16"


    This form is the extension of the
    FORM_ALONE form.
    For example, nop ? will fill two half-words, and an apropriate value of the Imm16 field is generated by the assembler.

    IX : "Immediate and ignore Imm16"


    This form is only used by the
    SKIP and Q instructions and is the "long" equivalent to the above I form.
    For example, skip 1 ?

    RX : "Register and ignore Imm16"


    This form is only used by the
    BSWAP instruction and is the "Ignore" version of the R form (so the assembler's forms combinations are exhaustive).
    For example, BSWAP R1 ?

    RRX : "Register to Register, and ignore Imm16"


    This form is used by many instructions that only make sense with the
    RR form. This form allows them to be extended to a long instruction where the Imm16 field is ignored.
    For example, LZB R1 R3 ?

    IRX : "skip Immediate if Register, and ignore Imm16"


    Just like the above
    IR form, this form is used by the conditional skip instructions (SO/SNE/SNO/SE, SS/SNS but not SZ/SNZ) and the conditional jump instructions (QO/QNE/QNO/QE, QS/QNS but not QZ/QNZ) because a long instruction form makes no sense (the opcode already contains a negation field).
    For example, SNS 2 R3 ?

    Flags

    Many opcodes have "flags", which modify the instruction's behaviour, or make it more precise. They are used by all kinds of software, particularly the assembler, the disassembler and the (future) instruction decoder.

    FLAG_SWP : Swap the register operands

    This is a modifier of the instruction form, not a form itself, and it is needed for hardware simplicity. This flag indicates that the operands are swapped internally in order to keep the assembly-level instructions easy to read and implement.
    Behind the scene, the destination register becomes the first operand, so in the instruction
    shr d2, r3 :

    FLAG_ALIAS_RR :

    This flag is used by the BSWAP instruction, where a register operand can be omitted when the source register is the same as the destination register. The SRC1 and SRC2 are then written with the same value, but you only need to write it once.

    Default values

    Normally, all uninitialized data, fields or values are cleared (zero).

    However, when certain instruction fields fields are not used (by FORM_ALONE, FORM_R, FORM_I, ...), these fields are set to values chosen by the assembler, for the purpose of power reduction.

    Toggle minimization and toggle spacing could help reduce the power consumption and EMI emissions. The VSP assembler will be able to compute proper values rather easily. Padding Imm16s and NOPs can be enhanced this way, too. The possible reduction is quite low, but maybe could reach 5% when fetching instructions from external SDRAM ? Gotta try with and without, and even when EMI/toggles are maximized, just for testing it.