This page describes the YASEP's JavaScript assembler (the code that translates the textual instructions into binary code) and the language conventions (syntax) used when writing software for the YASEP. These informations are specific to the yasep.org tools and third-party tools (such as defora's) might differ significantly, check their respective documentations carefuly. For informations about specific instructions, see the opcode map.
This page covers the following points :
The YASEP's assembler is split into two layers :
An instruction is the basic unit of software, like an atom.
* For the YASEP, it is a 16-bit or 32-bit word that contains several fields
that describe what to do with what.
* For the assembler, an instruction is a line containing all these informations
in readable, symbolic form.
An instruction typically contains
The binary structure of the instruction is explained there.
The YASEP's instructions are quite simple but they are not always practical, so some opcodes introduce a few modifications, as indicated by opcode flags (internal properties of the opcodes, listed here). They are usually harmless and don't impact the architecture, but they make the instructions more handy, thus helping writing/reading programs easily.
The assembler uses these flags when transforming the source text into binary codes. They keep the source code readable and coherent, independently from any hardware tricks, exceptions or processor versions.
The purpose of assembly language is to let the software developer write code without having to remember all the architectural details. He only needs to remember these five rules :
These rules are enough to understand the meaning of all the instructions,
even if some have variations or exceptions.
For example : add 3 r2 d3 LSB1 a2
The only exceptions to these rules are the forms FORM_Ri and FORM_RI because a few instructions use an immediate data to provide a destination address.
and that's all there is to say about the subject of "input syntax".
The numbers are accepted in 3 formats :
Numbers are mostly used in contexts where the number of significant bits is bounded to the size of the container (usually, the immediate fields of instructions). When too many digits are given, the assembler keeps only the desired number of LSB, and discards the MSB. The following example shows how the number is truncated : db 1234h.
Note : the disassembler always uses hexadecimal as output format.
The ability to output arbitrary numbers is critical for many uses so the assembler has the following three pseudo-instructions :
An unbounded number of litteral numbers is accepted, the limit depends on the JavaScript engine, not on the assembler's design. The assembler can provide all the output's numbers as a stream when the emit_bin() function is defined, such as in this example:
Result :
(error messages)
Since 2008-08, the YASEP exists in 16-bit and 32-bit variants. The opcodes don't change but a few of them are pointless in 16-bit mode or 32-bit mode. The source code can specify that a certain width is used so a warning is issued when an invalid instruction (depending on the CPU) is assembled.
YASEP16 specifies that the targetted CPU has a 16-bit datapath. All 32-bit only instructions generate a warning.
YASEP32 specifies that the targetted CPU has a 32-bit datapath. All 16-bit only instructions generate a warning.
YASEP resets the target CPU to generic/undefined.
These pseudo-instructions don't generate any code and can be used in any order, as they simply control an internal flag. This flag is compared with each instruction's flag (see YASEP32_ONLY and YASEP16_ONLY).
Since 2012-02, the datapath width is one of the parameters that make a "CPU profile". You can select a CPU profile with the .profile keyword in your source code. You can also check or create profiles in the dedicated interface.
The instruction-level assembler recognizes the following symbols, and rejects anything else :
The assembler eases instruction coding (letting the programer think about what to do, while caring about how to do it) with two types of aliases : form aliases (see ALIAS_RR) and instruction aliases (they are listed under the opcode map). This section is about instruction aliases.
Internally, they can be used like normal instructions, but they provide different forms and/or different semantics. However, they use real opcodes of other instructions.
The substitution is handled at the assembly level and the disassembler probably won't infer the originally assembled alias. So don't be surprised if instructions like NOT or NEG assemble correctly, but the disassembly returns a different opcode.
Despite the very simple instruction format, the assembly language instructions appear with several different forms. This is due usually to reasons like :
The various instructions forms are described in the forms page.
The flags are listed in their own page too.
Normally, all uninitialized data, fields or values are cleared (zero).
However, when certain instruction fields are not used (by FORM_ALONE or FORM_R...), these fields could be set to values chosen by the assembler, for the purpose of power reduction.
Toggle minimization and toggle spacing could help reduce the power consumption and EMI emissions. The YASEP's assembler will be able to compute proper values rather easily. Padding Imm16s and NOPs could be enhanced this way, too. The possible reduction is quite low, but maybe could reach a few percents when fetching instructions from external SDRAM ? An equivalent gain can come from efficient instruction coding/packing and compiler/algorithm smartness.
Since 2012-02, the tools include a "high level assembler" called YASMed. This is a graphic user interface that handles instructions line-by-line, with little respect for the underlying low-level CPU architecture. You can start a new instance with the ASM menu or by clicking on code zones like this:
NOP ; source code example
YASMed is not a classic multiple-passes assembler, as it solves references at edition time, which can be out-of-order. In case of unresolved symbols, hit the "re-assemble" button to pop up a new window with an updated symbol table.
Currently, YASMed recognises certain keywords with a leading dot:
Here is a simple example that uses the above keywords:
.name Dumb_Example ; this code will be saved to ; a file named Dumb_Example.yas .profile YASEP16 ; This program expect to run ; on a generic 16 bits version of the YASEP .subst Counter R1 ; substitute variable names .subst tmp R2 ; with actual register names .org 22 ; locate the code at address 22 mov 0 Counter . LabelLoop ; Loop entry label ; Loop body of any size .align 32 ; the next instruction ; will be aligned to a 32-byte boundary ; Loop 65536 times : add 1 Counter ; increments the counter mov LabelLoop tmp ; load the loop address in R2 mov tmp PC NZ Counter ; Loop if the counter is not 0 HALT ; End of program : hang the CPU