yasep/test/boot_SPI.txt created: 2008 08 17 by whygee version Mon Mar 23 15:57:29 CET 2009 This file is a first draft concerning the boot sequence and the associated circuits in the context of the Actel ProASIC3 Flash-based FPGA. ------------------------------------------------------------------- * To reduce the pin count and the PCB surface, a serial EEPROM is used to store the system's software. * Complexity of the boot logic and state machines favors the SPI protocol over the I2C protocol. * Flexibility and further reduction of state machine count favors the use of the integrated "Flash ROM" (a read-only 128 byte Flash area that can be programmed through externa JTAG) as a preliminary boot software storage. * The EEPROM (a 25xx512 or 25xx1024) ------------------------------------------------------------------- F-ROM : 128 bytes of instructions It is still uncertain how the contents of the FROM can be transferred to the external SSRAM. Maybe the F-ROM should be mapped to the executable address range. Or the F-ROM is mapped to the SR area, and accessed by "force-fed" instructions in the decoder. Or a circuit that directly accesses the SSRAM could be used. But none of those eventual methods are satisfying. ------------------------------------------------------------------- SPI memories : there two supported chip versions : - 64KB or less : 16 bits of address - up to 16MB : 24 bits of address The protocol is the same, except for the number of address bytes to send when initiating the memory read sequence. This difference is handled by the preliminary bootstrap software. For example : a weak pull-down or pull-up resistor tied to the SDI pin of the FPGA will indicate the size of the memory array. When /CS is high, the EEPROM chip leaves this pin floating (Hi-Z) and the configuration can be read. OR : the EEPROMs <=64KB include an additionnal padding byte so the stream is identical with what YASEP would receive from a >64KB link. The FPGA controls the /CS, SCK, SDI and SDO lines. /WP and /HOLD are tied inactive (high). The operating frequency varies between 1 and 20MHz depending on the chip : this must be taken into account in the initial clock configuration. Variable frequency clocking is necessary for YASEP. Initial EEPROM read occurs at low speed to load 4KB, then YASEP jumps to this address and configures the rest, incl. the faster clock speed. ------------------------------------------------------------------- SR area : The Special Registers could map the F-ROM directly. However the F-ROM is 8-bit wide and it would make the bootstrap state machinery complex. The SPI bus is also mapped in the Special Registers : - SPI_CTL writes the status of the /CS pin - SPI_DOUT writes a data bit to the EEPROM's SDI and generates a clock cycle - SPI_DIN reads the EEPROM's SDO, generates a clock cycle and shifts the data in the register ==> an uninterrupted sequence of 16 instructions "GET SPI_DIN, Rx" will fill Rx with the 16 bits generated by the EEPROM. ==> only one dumb shift register (and no state machine) is required to read the EEPROM (the state machine is in SW stored in the preliminary boot FROM) ------------------------------------------------------------------- Contents of the serial EEPROM : 16 bits : first address to upload 16 bits : Byte count 16 bits : initial instruction pointer 16 bits : checksum ? X*16 bits : data/code The rest : can be read by the application code later. ------------------------------------------------------------------- The first 128 bytes of code : * INIT CPU : - init the clock - init the IRQ - init the thread register - init the first page address extension entries init Ax = 0xFFFE (so Dx can be used as scratch registers) * INIT SPI : - SPI_CS=0 - SPI_CS=1 - read SPI_SDI => nb of address bits (2 or 3 bytes) ;;; beware : this triggers a clock cycle - SPI_CS=0 - write READ opcode (00000011) - write 2(3)* 0h bytes * READ SPI CONFIG : - read 16 bits => first address (current pointer : A4) - read 16 bits => last address (D5 ?) - read 16 bits => jump address (A3) - read 16 bits => checksum ? (D2 ?) * READ SPI BLOCK : - read 16 bits ==> D4 (autoincrement Q4) - checksum -= D4 - IF A4 < A5 loop READ SPI BLOCK - IF checksum != 0 loop INIT CPU - SPI_CS=1 ; disable the EEPROM to save power - jump to A3 *********************************************************************** INIT_SPI: mov 0,R0 ; 2 put R0, SPI_CS ; 4 ; ... waste some ns ? mov -2, A5 ; 2 mov A5, A4 ; 2 mov 1, R0 ; 2 put R0, SPI_CS ; 4 ; ... waste some ns ? mov A5, A3 ; 2 mov A5, A2 ; 2 get SPI_SDI, D2 ; 4 and 1, D2 ; shl D2,3,D2 ; [0/1]*8 bits add D2,16,D2 ; read command : xor R2,R2 mov 6, R3 call WriteR3Bits mov 1,R3 PUT R3, SPI_OUT PUT R3, SPI_OUT mov D2,R3 call WriteR3Bits ; ==> about 28 bytes READ_SPI_CONFIG: ; read 16 bits => first address (current pointer : A4) call READ16bits mov R2,A4 ; read 16 bits => byte count (D5) call READ16bits mov R2,A5 ; read 16 bits => jump address (A3) call READ16bits mov R2,A3 ; read 16 bits => checksum ? (D2 ?) call READ16bits mov R2,D2 ; ==> 24 bytes loop_block: ; read 16 bits ==> D4 (autoincrement Q4) call READ16bits mov R2,D4 ; checksum -= D4 SUB D2,D4 ; IF A4 < A5 loop loop_block ADD D5,-2,D5 jnz D5, loop_block ; IF checksum != 0 loop INIT CPU JNZ D4, 0 ; SPI_CS=1 ; disable the EEPROM to save power mov 1, R0 put R0, SPI_CS ; jump to A3 jmp A3 ; ==> 32 octets READ16bits: mov 16,R1 L1: get SPI_DIN R2 add R1,-1,R1 jz R1, L1 return ; ===> 20 bytes ? WriteR3Bits : ; number of bits in R3, ; value of bit in R2 PUT R2, SPI_OUT ; 4 ADD R3,-1,R3 ; 4 jmpZ R3, WriteR3Bits ; 4 return ; 4 ; ===> 16 bytes *********************************************************************** The above code needs 130 bytes, just > 128. It could be reduced a bit because the checksum takes 12 bytes and is not critical. Detecting the EEPROM size also takes 12 bytes and could be removed in specific cases. Choose either checksum or extended compatibility... or none. One can even skip the final EEPROM disable. This code is not working, but it was written for functionality and uses opcodes/instructions that are not (yet) part of the existing specification or opcode map. The opcodes and encodings are not definitive and some bytes could be saved here and there in the future. The YASEP was not designed to be very code-compact anyway. ********************************************************************** note : CQ : removed A0 = instruction pointer (read only ?) D0 = status word : auto-update + IRQen + skip? (shadows/alias) autoupdate en 16 bits : 5 queues A1/D1 post-inc/dec when accessed A2/D2 post-inc/dec A2/D3 post-inc/dec A4/D4 stack-capable A5/D5 stack-capable ____________ 10 bits IRQ_EN : 1 16/32 bits mode 0h: Q1dir 1h: Q1en 2h: Q2dir 3h: Q2en 4h: Q3dir 5h: Q3en 6h: Q4dir 7h: Q4stack 8h: Q4en 9h: Q5dir Ah: Q5stack Bh: Q5en Ch: Dh: Eh: AddrTranslation Fh: 16/32 bits (0=16-bit mode) ****************************************************** Mon Mar 23 15:57:29 CET 2009 In order to speed up the code and reduce its size, the first byte (in 16M mode, or 2 bytes in 64K mode) is dumb, so only one "read 16 bits" call is needed for both, without the need to align the stream. Also, short instructions now can contain immediates, which reduces the code size significantly.