The YASEP's assembly language

Introduction

This page describes the YASEP's JavaScript assembler and the language conventions (syntax) used when writing software for the YASEP. For informations about specific instructions, see the opcode map and the Instruction Set Manual pages.

This page covers the following points :

The Instruction-Level Assembler
How to write numbers
The pseudo-instructions (like DB, DH, DW or YASEP)
The reserved keywords
The instruction aliases
The flags
The default values

Architecture

The YASEP's assembler is split into two layers :

The line-by-line, or instruction-level assembler is a stateless routine that assembles one instruction at a time. It can be accessed through the asm.html interface or with the floating window (when you click on instructions in HTML files, like add d2, r3). It is not very sophisticated but very useful in countless places.

The high-level assembler takes a whole file, plits it into single lines and feeds them to the above routine. It does not deal with the instructions directly but assembles the results, manages the symbol tables, evaluates/computes values, create the binaries, handles preprocessing... It will be described and developped later.

Or maybe the high-level assembly will simply be implemented as a side-function of listed.

Instruction-level assembly

The YASEP's instructions are very simple but they are not always practical, so some opcodes introduce a few modifications (see here). They are usually harmless and don't impact the architecture, but they make the instructions more handy, thus helping writing/reading programs easily.

The assembly language's goal is to hide the architectural details from the mind of the software developper. He should keep in mind a few rules :

Rule 0 : The opcode comes first.

Rule 1 : The destination register comes after all the source operands (usually, at the end of the instruction).

Rule 2 : Generally, the immediate value comes just after the opcode, before the other operands (except the few cases that must obey rule#1, like FORM_Ri and FORM_RI).
Rule 3 : The conditional codes come at the end of the instruction, after the destination register.

Input processing

The assembly routine takes a single line (a character string with no "end of line" like '\n', '\r'...)
The line may end with a comment, started with ';'. Starting from this delimiter, all the followin characters are flushed.
The leading and trailing spaces are removed, the comas and tabs are replaced by spaces, and duplicated spaces are merged. You are free to write add 234h d2 r3, add 234h d2, r3 or add 234h, d2, r3
The whole line is converted to upper case. So it does not matter if you write adD 234h D2 r3 or ADd 234h d2 R3 because it will become ADD 234H D2 R3 internally.
The first word must be the opcode.
Depending on the instruction, register names, keywords and/or constant numbers follow. The order and function of these parameters are defined by the "form".

and that's all there is to say about the subject of "input syntax".

Number formatting

The numbers are accepted in 3 formats :

Hexadecimal ([0-9A-F]* with a trailing 'h'), for example dw 1234h (the 0x prefix or the $ prefix is never used)
Decimal ([0-9]* without prefix or suffix) : dw 1234.
This is the only format that accepts a leading minus sign : dw -1234
Binary ([01]* with a trailing 'b') : dw 01101101b

Numbers are mostly used in contexts where the number of significant bits is bounded to the size of the container (usually, the immediate fields). When more bits are input, the assembler keeps only the desired number of LSB, and discards the MSB. The following example shows how the number is truncated : db 1234h.

Pseudo-instructions

The ability to output arbitrary numbers is critical for many uses so the assembler has the following three pseudo-instructions :

DB : "Data Byte"
outputs 8 bits : db 12h

DH : "Data Half-word"
outputs 16 bits : dh 1234h

DW : "Data Word"
output 32 bits : dw 12345678h

An unbounded number of litteral numbers is accepted (since 2007-04-04 and the limit depends on the JavaScript engine, not on the assembler's design). The floating assembler's window will not display all the digits when more than 32 bits are given but they are available through software (by defining the emit_bin() function).

Example : db 12h 34h 56h 78h 9Ah BCh DEFh
emit_bin()'s output :

The ability to include ASCII strings is still missing at this time of writing.

Core datapath width

Since 2008-08, the YASEP exists in 16-bit and 32-bit variants. The opcodes don't change but a few of them are pointless in 16-bit mode or 32-bit mode. The source code can specify that a certain width is used so a warning is issued when a pointless instruction (probably invalid for the given CPU) is assembled.

YASEP16 specifies that the targetted CPU has a 16-bit datapath. All 32-bit only instructions generate a warning.

YASEP32 specifies that the targetted CPU has a 32-bit datapath. All 16-bit only instructions generate a warning.

YASEP resets the target CPU to generic/undefined.

These pseudo-instructions don't generate any code and can be used in any order, as they simply control an internal flag. This flag is compared with each instruction's flag (see YASEP32_ONLY and YASEP16_ONLY below).

Reserved symbols

The instruction-level assembler recognizes the following symbols, and rejects anything else :

Register names (NPC R0 R1 R2 R3 R4 A0 D0 A1 D1 A2 D2 A3 D3 A4 D4)
Pseudo-instructions (DB DH DW YASEP16 YASEP32 YASEP)
The condition codes (LSB0 LSB1 MSB0 MSB1 ZERO NZ)
The instruction opcodes and the opcode aliases ()

Aliases

There are two types of aliases : form aliases (see ALIAS_RR) and instruction aliases (they are listed under the opcode map). This section is about instruction aliases.

Internally, they can be used like normal instructions, but they provide different forms and/or different semantics. However, they use real opcodes of other instructions. The substitution is handled at the assembly level and the disassembly probably won't infer the originally assembled alias. So don't be surprised if instructions like NOT or NEG assemble correctly, but the disassembly returns a different opcode.

Instruction Forms and flags

Despite the very simple instruction format, the assembly language instructions appear with several different forms. This is due usually to reasons like :

The instruction does not make sense with one form, for example HALT does not need an immediate field.

The assembly language must be easy to read and understand, hence some fields are written in a different order internally.

The various instructions forms are described in the instructions.html page.

The flags have their own page too.

Default values

Normally, all uninitialized data, fields or values are cleared (zero).

However, when certain instruction fields are not used (by FORM_ALONE or FORM_R...), these fields could be set to values chosen by the assembler, for the purpose of power reduction.

Toggle minimization and toggle spacing could help reduce the power consumption and EMI emissions. The YASEP's assembler will be able to compute proper values rather easily. Padding Imm16s and NOPs can be enhanced this way, too. The possible reduction is quite low, but maybe could reach 5% when fetching instructions from external SDRAM ? Gotta try with and without, and even when EMI/toggles are maximized, just for testing it.