yasep/test/boot_SPI.txt
created: 2008 08 17 by whygee
version Mon Mar 23 15:57:29 CET 2009


This file is a first draft concerning the boot sequence and the associated circuits
in the context of the Actel ProASIC3 Flash-based FPGA.


-------------------------------------------------------------------

* To reduce the pin count and the PCB surface, a serial EEPROM is used
to store the system's software.

* Complexity of the boot logic and state machines favors the SPI protocol
over the I2C protocol.

* Flexibility and further reduction of state machine count favors the use
of the integrated "Flash ROM" (a read-only 128 byte Flash area that can be programmed
through externa JTAG) as a preliminary boot software storage.

* The EEPROM (a 25xx512 or 25xx1024) 

-------------------------------------------------------------------

F-ROM : 128 bytes of instructions

It is still uncertain how the contents of the FROM can be transferred
to the external SSRAM. Maybe the F-ROM should be mapped to the executable
address range. Or the F-ROM is mapped to the SR area,
and accessed by "force-fed" instructions in the decoder.
Or a circuit that directly accesses the SSRAM could be used.

But none of those eventual methods are satisfying.


-------------------------------------------------------------------

SPI memories : there two supported chip versions :
 - 64KB or less : 16 bits of address
 - up to 16MB : 24 bits of address
The protocol is the same, except for the number of address bytes to send
when initiating the memory read sequence.

This difference is handled by the preliminary bootstrap software.
For example : a weak pull-down or pull-up resistor tied to the SDI pin of the FPGA
will indicate the size of the memory array. When /CS is high, the EEPROM chip
leaves this pin floating (Hi-Z) and the configuration can be read.

OR : the EEPROMs <=64KB include an additionnal padding byte
so the stream is identical with what YASEP would receive
from a >64KB link.

The FPGA controls the /CS, SCK, SDI and SDO lines.
/WP and /HOLD are tied inactive (high).

The operating frequency varies between 1 and 20MHz depending on the chip :
this must be taken into account in the initial clock configuration.
Variable frequency clocking is necessary for YASEP.

Initial EEPROM read occurs at low speed to load 4KB,
then YASEP jumps to this address and configures the rest,
incl. the faster clock speed.


-------------------------------------------------------------------

SR area :

The Special Registers could map the F-ROM directly.
However the F-ROM is 8-bit wide and it would make the bootstrap state machinery complex.

The SPI bus is also mapped in the Special Registers :
 - SPI_CTL writes the status of the /CS pin
 - SPI_DOUT writes a data bit to the EEPROM's SDI and generates a clock cycle
 - SPI_DIN reads the EEPROM's SDO, generates a clock cycle and shifts the data in the register

==> an uninterrupted sequence of 16 instructions "GET SPI_DIN, Rx" will fill Rx with
the 16 bits generated by the EEPROM.

==> only one dumb shift register (and no state machine) is required to read the EEPROM
(the state machine is in SW stored in the preliminary boot FROM)

-------------------------------------------------------------------

Contents of the serial EEPROM :

16 bits : first address to upload
16 bits : Byte count
16 bits : initial instruction pointer
16 bits : checksum ?
X*16 bits : data/code
The rest : can be read by the application code later.

-------------------------------------------------------------------

The first 128 bytes of code :

* INIT CPU :
 - init the clock
 - init the IRQ
 - init the thread register
 - init the first page address extension entries

 init Ax = 0xFFFE (so Dx can be used as scratch registers)

* INIT SPI :
 - SPI_CS=0
 - SPI_CS=1
 - read SPI_SDI => nb of address bits (2 or 3 bytes) ;;; beware : this triggers a clock cycle
 - SPI_CS=0
 - write READ opcode (00000011)
 - write 2(3)* 0h bytes

* READ SPI CONFIG :
 - read 16 bits => first address (current pointer : A4)
 - read 16 bits =>  last address (D5 ?)
 - read 16 bits =>  jump address (A3)
 - read 16 bits =>  checksum ? (D2 ?)

* READ SPI BLOCK :
 - read 16 bits ==> D4 (autoincrement Q4)
 - checksum -= D4
 - IF A4 < A5 loop  READ SPI BLOCK
 
 - IF checksum != 0  loop INIT CPU
 - SPI_CS=1 ; disable the EEPROM to save power
 - jump to A3

***********************************************************************

INIT_SPI:

  mov 0,R0            ; 2
  put R0, SPI_CS      ; 4
; ... waste some ns ?
  mov -2, A5          ; 2
  mov A5, A4          ; 2

  mov 1, R0           ; 2
  put R0, SPI_CS      ; 4
; ... waste some ns ?
  mov A5, A3          ; 2
  mov A5, A2          ; 2

  get SPI_SDI, D2     ; 4
  and 1, D2           ; 
  shl D2,3,D2 ; [0/1]*8 bits
  add D2,16,D2
; read command :
  xor R2,R2
  mov 6, R3
  call WriteR3Bits
  mov 1,R3
  PUT R3, SPI_OUT
  PUT R3, SPI_OUT
  mov D2,R3
  call WriteR3Bits
; ==> about 28 bytes


READ_SPI_CONFIG:
; read 16 bits => first address (current pointer : A4)
  call READ16bits
  mov R2,A4
; read 16 bits =>  byte count (D5)
  call READ16bits
  mov R2,A5
; read 16 bits =>  jump address (A3)
  call READ16bits
  mov R2,A3
; read 16 bits =>  checksum ? (D2 ?)
  call READ16bits
  mov R2,D2
; ==> 24 bytes

loop_block:
; read 16 bits ==> D4 (autoincrement Q4)
  call READ16bits
  mov R2,D4
; checksum -= D4
  SUB D2,D4
; IF A4 < A5 loop loop_block
  ADD D5,-2,D5
  jnz D5, loop_block

; IF checksum != 0  loop INIT CPU
  JNZ D4, 0
; SPI_CS=1 ; disable the EEPROM to save power
  mov 1, R0
  put R0, SPI_CS
; jump to A3
  jmp A3
; ==> 32 octets


READ16bits:
 mov 16,R1
L1:
 get SPI_DIN R2
 add R1,-1,R1
 jz R1, L1 
 return  ; ===> 20 bytes ?

WriteR3Bits :
 ; number of bits in R3,
 ; value of bit in R2
 PUT R2, SPI_OUT       ; 4
 ADD R3,-1,R3          ; 4
 jmpZ R3, WriteR3Bits  ; 4
 return                ; 4
;   ===> 16 bytes

***********************************************************************

The above code needs 130 bytes, just > 128.
It could be reduced a bit because the checksum takes 12 bytes and is not critical.
Detecting the EEPROM size also takes 12 bytes and could be removed in specific cases.
Choose either checksum or extended compatibility... or none.
One can even skip the final EEPROM disable.

This code is not working, but it was written for functionality
and uses opcodes/instructions that are not (yet) part of the existing specification or opcode map.
The opcodes and encodings are not definitive and some bytes could be saved
here and there in the future. The YASEP was not designed to be very code-compact anyway.


**********************************************************************

note :
CQ : removed
A0 = instruction pointer (read only ?)
D0 = status word : auto-update + IRQen + skip?
(shadows/alias)

autoupdate en 16 bits :
5 queues
A1/D1  post-inc/dec when accessed
A2/D2  post-inc/dec
A2/D3  post-inc/dec
A4/D4  stack-capable
A5/D5  stack-capable
____________
10 bits

IRQ_EN : 1

16/32 bits mode

0h: Q1dir
1h: Q1en
2h: Q2dir
3h: Q2en
4h: Q3dir
5h: Q3en
6h: Q4dir
7h: Q4stack
8h: Q4en
9h: Q5dir
Ah: Q5stack
Bh: Q5en
Ch: 
Dh: 
Eh: AddrTranslation
Fh: 16/32 bits (0=16-bit mode)


******************************************************

Mon Mar 23 15:57:29 CET 2009


In order to speed up the code and reduce its size, the first
byte (in 16M mode, or 2 bytes in 64K mode) is dumb,
so only one "read 16 bits" call is needed for both, without
the need to align the stream.

Also, short instructions now can contain immediates,
which reduces the code size significantly.