Address unit¶
Introduction¶
The address unit is one of the four execution units of VP1. It transfers data between that data store and registers, controls the DMA unit, and performs address calculations.
The data store¶
The data store is the working memory of VP1, 8kB in size. Data can be
transferred between the data store and $r
/$v
registers using load/store
instructions, or between the data store and main memory using the DMA
engine. It’s often treated as two-dimensional, with row stride
selectable between 0x10
, 0x20
, 0x40
, and 0x80
bytes: there are
“load vertical” instructions which gather consecutive bytes vertically rather
than horizontally.
Because of its 2D capabilities, the data store is internally organized into 16 independently addressable 16-bit wide banks of 256 cells each, and the memory addresses are carefully spread between the banks so that both horizontal and vertical loads from any address will require at most one access to every bank. The bank assignments differ between the supported strides, so row stride is basically a part of the address, and an area of memory always has to be accessed with the same stride (unless you don’t care about its previous contents). Specifially, the translation of (address, stride) pair into (bank, cell index, high/low byte) is as follows:
def address_xlat(addr, stride):
bank = addr & 0xf
hilo = addr >> 4 & 1
cell = addr >> 5 & 0xff
if stride == 0:
# 0x10 bytes
bank += (addr >> 5) & 7
elif stride == 1:
# 0x20 bytes
bank += addr >> 5
elif stride == 0x40:
# 0x40 bytes
bank += addr >> 6
elif stride == 0x80:
# 0x80 bytes
bank += addr >> 7
bank &= 0xf
return bank, cell, hilo
In pseudocode, data store bytes are denoted by DS[bank, cell, hilo]
.
In case of vertical access with 0x10 bytes stride, all 16 bits of 8 banks will be used by a 16-byte access. In all other cases, 8 bits of all 16 banks will be used for such access. DMA transfers can make use of the full 256-bit width of the data store, by transmitting 0x20 consecutive bytes at a time.
The data store can be accessed by load/store instructions in one of four ways:
horizontal: 16 consecutive naturally aligned addresses are used:
def addresses_horizontal(addr, stride): addr &= 0x1ff0 return [address_xlat(addr | idx, stride) for idx in range(16)]
vertical: 16 addresses separated by stride bytes are used, also naturally aligned:
def addresses_vertical(addr, stride): addr &= 0x1fff # clear the bits used for y coord addr &= ~(0xf << (4 + stride)) return [address_xlat(addr | idx << (4 + stride)) for idx in range(16)]
scalar: like horizontal, but 4 bytes:
def addresses_horizontal_short(addr, stride): addr &= 0x1ffc return [address_xlat(addr | idx, stride) for idx in range(4)]
raw: the raw data store coordinates are provided directly
Address registers¶
The address unit has 32 address registers, $a0-$a31
. These are used for
address storage. If they’re used to store data store addresses (and not DMA
command parameters), they have the following bitfields:
- bits 0-15:
addr
- the actual data store address - bits 16-29:
limit
- can store the high bounduary of an array, to assist in looping - bits 30-31:
stride
- selects data store stride:- 0: 0x10 bytes
- 1: 0x20 bytes
- 2: 0x40 bytes
- 3: 0x80 bytes
There are also 3 bits in each $c
register belonging to the address unit.
They are:
- bits 8-9: long address flags
- bit 8: sign flag - set equal to bit 31 of the result
- bit 9: zero flag - set if the result is 0
- bit 10: short address flag
- bit 10: end flag - set if
addr
field of the result is greater than or equal tolimit
- bit 10: end flag - set if
Some address instructions set either the long or short flags of a given $c
register according to the result.
Instruction format¶
The instruction word fields used in address instructions in addition to the ones used in scalar instructions are:
- bit 0: for opcode
0xd7
, selects the subopcode: - bits 3-13:
UIMM
: unsigned 13-bit immediate.
Todo
list me
Opcodes¶
The opcode range assigned to the address unit is 0xc0-0xdf
. The opcodes
are:
0xc0
: load vector horizontal and add: ldavh0xc1
: load vector vertical and add: ldavv0xc2
: load scalar and add: ldas0xc3
: ??? (xdld
)0xc4
: store vector horizontal and add: stavh0xc5
: store vector vertical and add: stavv0xc6
: store scalar and add: stas0xc7
: ??? (xdst
)0xc8
: load extra horizontal and add: ldaxh0xc9
: load extra vertical and add: ldaxv0xca
: address addition: aadd0xcb
: addition: add0xcc
: set low bits: setlo0xcd
: set high bits: sethi0xce
: ??? (xdbar
)0xcf
: ??? (xdwait
)0xd0
: load vector horizontal and add: ldavh0xd1
: load vector vertical and add: ldavv0xd2
: load scalar and add: ldas0xd3
: bitwise operation: bitop0xd4
: store vector horizontal and add: stavh0xd5
: store vector vertical and add: stavv0xd6
: store scalar and add: stas0xd7
: depending on instruction bit 0:0xd8
: load vector horizontal: ldvh0xd9
: load vector vertical: ldvv0xda
: load scalar: lds0xdb
: ???0xdc
: store vector horizontal: stvh0xdd
: store vector vertical: stvv0xde
: store scalar: sts0xdf
: the canonical address nop opcode
Todo
complete the list
Instructions¶
Set low/high bits: setlo, sethi¶
Sets low or high 16 bits of a register to an immediate value. The other half is unaffected.
- Instructions:
Instruction Operands Opcode setlo
$a[DST] IMM16
0xcc
sethi
$a[DST] IMM16
0xcd
- Operation:
if op == 'setlo': $a[DST] = ($a[DST] & 0xffff0000) | IMM16 else: $a[DST] = ($a[DST] & 0xffff) | IMM16 << 16
Addition: add¶
Does what it says on the tin. The second source comes from a mangled register index. The long address flags are set.
- Instructions:
Instruction Operands Opcode add
[$c[CDST]] $a[DST] $a[SRC1] $a[SRC2S]
0xcb
- Operation:
res = $a[SRC1] + $a[SRC2S] $a[DST] = res cres = 0 if res & 1 << 31: cres |= 1 if res == 0: cres |= 2 if CDST < 4: $c[CDST].address.long = cres
Bit operations: bitop¶
Performs an arbitrary two-input bit operation on two registers,
selected by SRC1
and SRC2
. The long address flags are set.
- Instructions:
Instruction Operands Opcode bitop
BITOP [$c[CDST]] $a[DST] $a[SRC1] $a[SRC2]
0xd3
- Operation:
res = bitop(BITOP, $a[SRC2], $a[SRC1]) & 0xffffffff $a[DST] = res cres = 0 if res & 1 << 31: cres |= 1 if res == 0: cres |= 2 if CDST < 4: $c[CDST].address.long = cres
Address addition: aadd¶
Adds the contents of a register to the addr
field of another register.
Short address flag is set.
- Instructions:
Instruction Operands Opcode aadd
[$c[CDST]] $a[DST] $a[SRC2S]
0xca
- Operation:
$a[DST].addr += $a[SRC2S] if CDST < 4: $c[CDST].address.short = $a[DST].addr >= $a[DST].limit
Load: ldvh, ldvv, lds¶
Loads from the given address ORed with an unsigned 11-bit immediate. ldvh
is a horizontal vector load, ldvv
is a vertical vector load, and lds
is
a scalar load. Curiously, while register is ORed with the immdiate to form the
address, they are added to make $c
output.
- Instructions:
Instruction Operands Opcode ldvh
$v[DST] [$c[CDST]] $a[SRC1] UIMM
0xd8
ldvv
$v[DST] [$c[CDST]] $a[SRC1] UIMM
0xd9
lds
$r[DST] [$c[CDST]] $a[SRC1] UIMM
0xda
- Operation:
if op == 'ldvh': addr = addresses_horizontal($a[SRC1].addr | UIMM, $a[SRC1].stride) for idx in range(16): $v[DST][idx] = DS[addr[idx]] elif op == 'ldvv': addr = addresses_vertical($a[SRC1].addr | UIMM, $a[SRC1].stride) for idx in range(16): $v[DST][idx] = DS[addr[idx]] elif op == 'lds': addr = addresses_scalar($a[SRC1].addr | UIMM, $a[SRC1].stride) for idx in range(4): $r[DST][idx] = DS[addr[idx]] if CDST < 4: $c[CDST].address.short = (($a[SRC1].addr + UIMM) & 0xffff) >= $a[SRC1].limit
Load and add: ldavh, ldavv, ldas¶
Loads from the given address, then post-increments the address by the contents
of a register (like the aadd instruction) or an immediate.
ldavh
is a horizontal vector load, ldavv
is a vertical vector load, and
ldas
is a scalar load.
- Instructions:
Instruction Operands Opcode ldavh
$v[DST] [$c[CDST]] $a[SRC1] $a[SRC2S]
0xc0
ldavv
$v[DST] [$c[CDST]] $a[SRC1] $a[SRC2S]
0xc1
ldas
$r[DST] [$c[CDST]] $a[SRC1] $a[SRC2S]
0xc2
ldavh
$v[DST] [$c[CDST]] $a[SRC1] IMM
0xd0
ldavv
$v[DST] [$c[CDST]] $a[SRC1] IMM
0xd1
ldas
$r[DST] [$c[CDST]] $a[SRC1] IMM
0xd2
- Operation:
if op == 'ldavh': addr = addresses_horizontal($a[SRC1].addr, $a[SRC1].stride) for idx in range(16): $v[DST][idx] = DS[addr[idx]] elif op == 'ldavv': addr = addresses_vertical($a[SRC1].addr, $a[SRC1].stride) for idx in range(16): $v[DST][idx] = DS[addr[idx]] elif op == 'ldas': addr = addresses_scalar($a[SRC1].addr, $a[SRC1].stride) for idx in range(4): $r[DST][idx] = DS[addr[idx]] if IMM is None: $a[SRC1].addr += $a[SRC2S] else: $a[SRC1].addr += IMM if CDST < 4: $c[CDST].address.short = $a[SRC1].addr >= $a[SRC1].limit
Store: stvh, stvv, sts¶
Like corresponding ld* instructions, but store instead of
load. SRC1
and DST
fields are exchanged.
- Instructions:
Instruction Operands Opcode stvh
$v[SRC1] [$c[CDST]] $a[DST] UIMM
0xdc
stvv
$v[SRC1] [$c[CDST]] $a[DST] UIMM
0xdd
sts
$r[SRC1] [$c[CDST]] $a[DST] UIMM
0xde
- Operation:
if op == 'stvh': addr = addresses_horizontal($a[DST].addr | UIMM, $a[DST].stride) for idx in range(16): DS[addr[idx]] = $v[SRC1][idx] elif op == 'stvv': addr = addresses_vertical($a[DST].addr | UIMM, $a[DST].stride) for idx in range(16): DS[addr[idx]] = $v[SRC1][idx] elif op == 'sts': addr = addresses_scalar($a[DST].addr | UIMM, $a[DST].stride) for idx in range(4): DS[addr[idx]] = $r[SRC1][idx] if CDST < 4: $c[CDST].address.short = (($a[DST].addr + UIMM) & 0xffff) >= $a[DST].limit
Store and add: stavh, stavv, stas¶
Like corresponding lda* instructions, but store instead of
load. SRC1
and DST
fields are exchanged.
- Instructions:
Instruction Operands Opcode stavh
$v[SRC1] [$c[CDST]] $a[DST] $a[SRC2S]
0xc4
stavv
$v[SRC1] [$c[CDST]] $a[DST] $a[SRC2S]
0xc5
stas
$r[SRC1] [$c[CDST]] $a[DST] $a[SRC2S]
0xc6
stavh
$v[SRC1] [$c[CDST]] $a[DST] IMM
0xd4
stavv
$v[SRC1] [$c[CDST]] $a[DST] IMM
0xd5
stas
$r[SRC1] [$c[CDST]] $a[DST] IMM
0xd6
- Operation:
if op == 'stavh': addr = addresses_horizontal($a[DST].addr, $a[DST].stride) for idx in range(16): DS[addr[idx]] = $v[SRC1][idx] elif op == 'stavv': addr = addresses_vertical($a[DST].addr, $a[DST].stride) for idx in range(16): DS[addr[idx]] = $v[SRC1][idx] elif op == 'stas': addr = addresses_scalar($a[DST].addr, $a[DST].stride) for idx in range(4): DS[addr[idx]] = $r[SRC1][idx] if IMM is None: $a[DST].addr += $a[SRC2S] else: $a[DST].addr += IMM if CDST < 4: $c[CDST].address.short = $a[DST].addr >= $a[DST].limit
Load raw: ldr¶
A raw load instruction. Loads one byte from each bank of the data store. The banks correspond directly to destination register components. The addresses are composed from ORing an address register with components of a vector register shifted left by 4 bits. Specifically, for each component, the byte to access is determined as follows:
- take address register value
- shift it right 4 bits (they’re discarded)
- OR with the corresponding component of vector source register
- bit 0 of the result selects low/high byte of the bank
- bits 1-8 of the result select the cell index in the bank
This instruction shares the 0xd7
opcode with star.
They are differentiated by instruction word bit 0, set to 0 in case of
ldr
.
- Instructions:
Instruction Operands Opcode ldr
$v[DST] $a[SRC1] $v[SRC2]
0xd7.0
- Operation:
for idx in range(16): addr = $a[SRC1].addr >> 4 | $v[SRC2][idx] $v[DST][idx] = DS[idx, addr >> 1 & 0xff, addr & 1]
Store raw and add: star¶
A raw store instruction. Stores one byte to each bank of the data store.
As opposed to raw load, the addresses aren’t controllable per component:
the same byte and cell index is accessed in each bank, and it’s selected
by post-incremented address register like for sta*.
$c
output is not supported.
This instruction shares the 0xd7
opcode with lda.
They are differentiated by instruction word bit 0, set to 1 in case of
star
.
- Instructions:
Instruction Operands Opcode star
$v[SRC1] $a[DST] $a[SRC2S]
0xd7.1
- Operation:
for idx in range(16): addr = $a[DST].addr >> 4 DS[idx, addr >> 1 & 0xff, addr & 1] = $v[SRC1][idx] $a[DST].addr += $a[SRC2S]
Load extra and add: ldaxh, ldaxv¶
Like ldav*, except the data is loaded to $vx
.
If a selected $c
flag is set (the same one as used for SRC2S
mangling), the same data is also loaded to a $v
register selected
by DST
field mangled in the same way as in vlrp2
family of instructions.
- Instructions:
Instruction Operands Opcode ldaxh
$v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S]
0xc8
ldaxv
$v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S]
0xc9
- Operation:
if op == 'ldaxh': addr = addresses_horizontal($a[SRC1].addr, $a[SRC1].stride) for idx in range(16): $vx[idx] = DS[addr[idx]] elif op == 'ldaxv': addr = addresses_vertical($a[SRC1].addr, $a[SRC1].stride) for idx in range(16): $vx[idx] = DS[addr[idx]] if $c[COND] & 1 << SLCT: for idx in range(16): $v[(DST & 0x1c) | ((DST + ($c[COND] >> 4)) & 3)][idx] = $vx[idx] $a[SRC1].addr += $a[SRC2S] if CDST < 4: $c[CDST].address.short = $a[SRC1].addr >= $a[SRC1].limit