The VSP assembly language

Introduction

This page describes the VSP's JavaScript assembler and the langage conventions necessary for writing software.

For informations about the VSP instruction set, see the opcode map and the introduction to the VSP.

Architecture

The VSP assembler is split into two parts :

The line-by-line, or instruction-level assembler is a stateless routine that assembles one instruction at a time. It can be accessed through the asm.html interface or with the floating window (when you click on instructions in HTML files, like add d2, r3). It is not very sophisticated but very useful in countless places.

The high-level assembler takes a whole file, plits it into single lines and feeds them to the above routine. It does not deal with the instructions directly but assembles the results, manages the symbol tables, evaluates/computes values, create the binaries, handles preprocessing... It will be described and developped later.

Instruction-level assembly

The VSP instructions are very simple but they are not always practical, so some opcodes introduce a few modifications. The are usually harmless and don't impact the architecture, but they make the instructions more handy, thus helping the instruction stream to be more efficient.

The assembly langage's goal is to hide these architectural details from the mind of the software developper. Thus comes the first rule : the destination register is the last in the line.

Input processing

The assembly routine takes a single line (a character string with no "end of line" like '\n', '\r'...)
The line may end with a comment, started with ';'. After this delimiter, all the character are flushed.
The leading and trailing spaces are removed, the comas and tabs are replaced by spaces, and duplicated spaces are merged. You are free to write add d2 234 r3, add d2 234, r3 or add d2, 234, r3
The whole line is converted to upper case. So it does not matter if you write adD D2 234h r3 or ADd d2 234h R3 because it will become ADD D2 234H R3 internally.
The first word must be the opcode.
Depending on the instruction, one or two register names, and/or constant numbers follow. The order and function of these parameters are defined by the "form", explained below.
A trailing question mark ("?") indicates that the opcode must use a long form when the value of the Imm16 field is ignored.

and that's all there is to say about the subject of "input syntax".

Number formatting

The numbers are always output in hexadecimal and are accepted in 3 formats :

Hexadecimal ([0-9A-F]* with a trailing 'h'), for example dw 1234h (the 0x prefix is never used in vspsim)
Decimal ([0-9]* without prefix or suffix) : dw 1234.
This is the only format that accepts a leading minus sign : dw -1234
Binary ([01]* with a trailing 'b') : dw 01101101b

Numbers are mostly used in contexts where the number of significant bits is bounded to the size of the container (usually, the optional immediate field). When more bits are input, the assembler keeps only the desired number of LSB, and discards the MSB. The following example shows how the number is truncated : db 1234h.

Pseudo-instructions

The ability to output arbitrary numbers is critical for many uses so the assembler has the following three pseudo-instructions :

DB : "Data Byte"
outputs 8 bits : db 12h

DH : "Data Half-word"
outputs 16 bits : dh 1234h

DW : "Data Word"
output 32 bits : dw 12345678h

An unbounded number of litteral numbers is accepted (since 2007-04-04 and the limit depends on the JavaScript engine, not on the assembler's design). The floating assembler's window will not display all the digits when more than 32 bits are given but they are available through software (by defining the emit_bin() function).

Example : db 12h 34h 56h 78h 9Ah BCh DEFh
emit_bin()'s output :

Reserved symbols

The instruction-level assembler recognizes the following symbols, and rejects anything else :

Register names (A0 D0 A1 D1 A2 D2 A3 D3 A4 D4 A5 D5 R0 R1 R2 R3)
Pseudo-instructions (DB DH DW)
The "ignore" sign (?) used by the -X forms
The instruction opcodes and the opcode aliases ()

Aliases

There are two types of aliases : form aliases (see FLAG_ALIAS_RR) and instruction aliases (they are listed under the opcode map). This is what we talk about here.

Internally, they can be used like normal instructions, but they provide different forms and/or different semantics. However, they use real opcodes of other instructions. The substitution is handled at the assembly level and the disassembly probably won't infer the originally assembled alias. So don't be surprised if instructions like NOT or NEG assemble correctly, but the disassembly returns a different opcode.

Instruction Forms

Despite the very simple instruction format, the assembly language instructions appear with several different forms. This is due usually to reasons like :

The instruction does not make sense with one form, for example BSWAP does not need an immediate field.

The assembly language must be easy to understand and some fields are written in a different order internally.

The forms can be decomposed into three main groups :

The binary instructions have 2 forms (long and short) but when one counts the combinations when useless fields are removed, there are 2×3=6 possible assembly langage forms : three variations (2, 1 or no register) of an instruction with (ALONE, R, RR) or without (I, IR and RIR) immediate data. The 7th form (RI) is a special case of IR.

The Jump and Skip instructions add 1 more form (IRI), while 2 are congruent with existing forms (I and IR). The disassembler distinguishes between Imm16 and other fields with the help of additional flags and attributes.

Finally, the Imm16 field is sometimes ignored, which gives 5 more forms : (X, IX, RX, RRX, and IRX)

All these forms may seem complex (at first sight) but they make the VSP's assembly langage source code easy to read and write, by sticking more to the semantic of the instruction than on the instruction's binary structure.

The availability of all these forms is also a compromise between flexibility of implementation and completeness of description (of both the assembler and disassembler). The current system allows new forms and syntaxes to be added or removed without changing the whole structure, thus making development and experimentation faster.

RR Form : "Register-Register"

This is the most usual form, with only two registers used, for example add d2, r3.

The first operand (d2) is the source register, encoded in the src2 field
The second operand (r3)is a second source register which will also hold the result. It is encoded in the src1/dest field.

Note :
With mov r1, d4, the first operand (r1 is encoded in the src2 field) is read and the value is stored in the last operand (d4 is encoded in the src1/dest field). src1/dest is maybe speculatively read but in this case, the value is not used, so it does not result in the auto-update of the pointer associated with d4 (a4) as a read. But this is not an issue anyway because the write auto-update takes precedence over the read auto-update.

R : "one Register only"

Some instructions (CLR and JMP) need only one operand (which is either a source or destination register).

This is also a syntax shortcut of RR for some instructions, when one wants both source and destination to be the same register. For example, bswap r1 will encode R1 in both the src1/dest and src2 fields of the instruction. In this case, the opcode has the FLAG_ALIAS_RR attribute.

ALONE : no parameter

Some instructions don't need any parameter, for example nop or inv. This form encodes to a short instruction because there is no immediate data. Use the I or X forms to force a long instruction (depending on the opcode and if the value of the immediate field is ignored or not).

RIR Form : "Register and Immediate to Register"

This form is used when immediate data are included in the instruction stream like this : add d2, 234, r3.

The first operand (d2) is the source register, encoded in the src2 field
The second operand (234) is a 16-bit, sign-extended, immediate data
The third operand (r3) will also hold the result, it is encoded in the src1/dest field.

The difference with RR is that the second source operand (src1/dest) is replaced by the immediate field, so the src1/dest field is only the address of a written register.

IR : "Immediate to Register"

As noted above, some instructions do not make sense with more than one register operand or field. The register might be used as both source (depending...) and (always) destination (the immediate field must be stored somewhere).
For example, mov 2, d3

2 is in the 16-bit immediate field. It is internally sign-extended to 32 bits.
D3 is the destination register, encoded in the src1/dest field.

In this case, the src2 field is ignored.

This IR form also means "Immediate if Register" for the conditional instructions.

The conditional skip instructions (SZ, SNZ, SO/SNE, SNO/SE, SS and SNS) replace the src2 field with a 2-bit immediate number, representing the number (minus one) of half-word that must be skipped (if the condition is met).
The conditional jump instructions (QZ, QNZ, QO/QNE, QNO/QE, QS and QNS) replace the src2 field with a 2-bit immediate number, representing the queue to which the processor switches (if the condition is met).

For example, the instruction so 3, a2 will skip 3 half-words if register A2 is Odd (LSB==1, or unaligned).

3 is in the 2-bit immediate field. Its real value is decremented because skipping 0 half-word makes no sense, and the range 1 to 16 is possible.
A2 is the tested register, encoded in the src1/dest field.

This form can be extended with a 16-bit immediate field in the IRI and IRX forms (depending on whether the Imm16 value is used or not).

RI : "Register to Immediate"

This is technically the same thing as the previous IR form. It is only needed by the PUT instruction, because the "destination" is the Special Register whose number is given as an immediate 16-bit number (so it must come last in the instruction).
For example, put d3 2754

D3 is the register that contains the 32-bit value that is written to the SR space.
2754 is the number of the targetted Special Register.

I : "Immediate"

Some instructions could need only one long immediate parameter, or must be writable in "long" form. However, there is no such instruction yet, most of them simply ignore the Imm16 field (see the X form).

The "I" form is also used by the inconditional SKIP and Q instructions. This is the equivalent of the above IR form, but without register field. The 2-bit immediate number represents

the number of instructions to skip (1 to 4) for the SKIP instruction
the queue to which to switch (Q0 to Q3) for the Q instruction

For example, the instruction skip 4 will skip 4 half-words (two words).

4 is in the 2-bit immediate field. Its real value is decremented because skipping 0 half-word makes no sense, and the range 1 to 4 is possible (encoded as 0 to 3).

IRI : "Immediate if Register xor Immediate"

This is the Imm16 extension of the above IR form, used by some conditional skip instructions (SZ/SNZ) and conditional jump instructions (QZ/QNZ). The other conditional instructions (SO/SNE, SNO/SE, SS, SNS, QO/QNE, QNO/QE, QS and QNS) make no sense with the immmediate 16-bit field because the condition is already XORed by the negation field.
For example, the instruction qz 2, a4 21 will switch to queue #2 if register A4 is equal to 21.

2 is in the 2-bit immediate field in the src2 field.
A4 is the tested register, encoded in the src1/dest field.
21 is in the 16-bit immediate field. It is internally sign-extended to 32 bits.

Forms that ignore the Imm16 field

Several instructions don't make sense with the optional imm16 field. However, the form could be useful for padding purpose, for example. The "-X" forms use the question mark ("?") to indicate that the long version of the instruction is desired, and the assembler will fill the remaining bits with adapted values (see below).

X : "ignore Imm16"

This form is the extension of the FORM_ALONE form.
For example, nop ? will fill two half-words, and an apropriate value of the Imm16 field is generated by the assembler.

IX : "Immediate and ignore Imm16"

This form is only used by the SKIP and Q instructions and is the "long" equivalent to the above I form.
For example, skip 1 ?

1 is the number of half-words to skip.
? asks the assembler to fill the Imm16 field with an apropriate value.

RX : "Register and ignore Imm16"

This form is only used by the BSWAP instruction and is the "Ignore" version of the R form (so the assembler's forms combinations are exhaustive).
For example, BSWAP R1 ?

R1 is both the source and destination register, in the SRC2 and SRC1/DEST fields.
? asks the assembler to fill the Imm16 field with an apropriate value.

RRX : "Register to Register, and ignore Imm16"

This form is used by many instructions that only make sense with the RR form. This form allows them to be extended to a long instruction where the Imm16 field is ignored.
For example, LZB R1 R3 ?

R1 is the source register in the SRC2 field.
R3 is the destination register, in the SRC1/DEST field.
? asks the assembler to fill the Imm16 field with an apropriate value.

IRX : "skip Immediate if Register, and ignore Imm16"

Just like the above IR form, this form is used by the conditional skip instructions (SO/SNE/SNO/SE, SS/SNS but not SZ/SNZ) and the conditional jump instructions (QO/QNE/QNO/QE, QS/QNS but not QZ/QNZ) because a long instruction form makes no sense (the opcode already contains a negation field).
For example, SNS 2 R3 ?

2 is the number of half-words that the core will skip if the sign bit of R3 is not set.
R3 is the tested register, in the SRC1/DEST field.
? asks the assembler to fill the Imm16 field with an apropriate value.

Flags

Many opcodes have "flags", which modify the instruction's behaviour, or make it more precise. They are used by all kinds of software, particularly the assembler, the disassembler and the (future) instruction decoder.

FLAG_SWP : Swap the register operands

This is a modifier of the instruction form, not a form itself, and it is needed for hardware simplicity. This flag indicates that the operands are swapped internally in order to keep the assembly-level instructions easy to read and implement.
Behind the scene, the destination register becomes the first operand, so in the instruction shr d2, r3 :

d2 is encoded in the src1/dest field
r3 is encoded in the src2 field (which becomes temporarily the destination)

FLAG_ALIAS_RR :

This flag is used by the BSWAP instruction, where a register operand can be omitted when the source register is the same as the destination register. The SRC1 and SRC2 are then written with the same value, but you only need to write it once.

Default values

Normally, all uninitialized data, fields or values are cleared (zero).

However, when certain instruction fields fields are not used (by FORM_ALONE, FORM_R, FORM_I, ...), these fields are set to values chosen by the assembler, for the purpose of power reduction.

Toggle minimization and toggle spacing could help reduce the power consumption and EMI emissions. The VSP assembler will be able to compute proper values rather easily. Padding Imm16s and NOPs can be enhanced this way, too. The possible reduction is quite low, but maybe could reach 5% when fetching instructions from external SDRAM ? Gotta try with and without, and even when EMI/toggles are maximized, just for testing it.