Address unit

Introduction

The address unit is one of the four execution units of VP1. It transfers data between that data store and registers, controls the DMA unit, and performs address calculations.

The data store

The data store is the working memory of VP1, 8kB in size. Data can be transferred between the data store and $r/$v registers using load/store instructions, or between the data store and main memory using the DMA engine. It’s often treated as two-dimensional, with row stride selectable between 0x10, 0x20, 0x40, and 0x80 bytes: there are “load vertical” instructions which gather consecutive bytes vertically rather than horizontally.

Because of its 2D capabilities, the data store is internally organized into 16 independently addressable 16-bit wide banks of 256 cells each, and the memory addresses are carefully spread between the banks so that both horizontal and vertical loads from any address will require at most one access to every bank. The bank assignments differ between the supported strides, so row stride is basically a part of the address, and an area of memory always has to be accessed with the same stride (unless you don’t care about its previous contents). Specifially, the translation of (address, stride) pair into (bank, cell index, high/low byte) is as follows:

def address_xlat(addr, stride):
    bank = addr & 0xf
    hilo = addr >> 4 & 1
    cell = addr >> 5 & 0xff
    if stride == 0:
        # 0x10 bytes
        bank += (addr >> 5) & 7
    elif stride == 1:
        # 0x20 bytes
        bank += addr >> 5
    elif stride == 0x40:
        # 0x40 bytes
        bank += addr >> 6
    elif stride == 0x80:
        # 0x80 bytes
        bank += addr >> 7
    bank &= 0xf
    return bank, cell, hilo

In pseudocode, data store bytes are denoted by DS[bank, cell, hilo].

In case of vertical access with 0x10 bytes stride, all 16 bits of 8 banks will be used by a 16-byte access. In all other cases, 8 bits of all 16 banks will be used for such access. DMA transfers can make use of the full 256-bit width of the data store, by transmitting 0x20 consecutive bytes at a time.

The data store can be accessed by load/store instructions in one of four ways:

  • horizontal: 16 consecutive naturally aligned addresses are used:

    def addresses_horizontal(addr, stride):
        addr &= 0x1ff0
        return [address_xlat(addr | idx, stride) for idx in range(16)]
    
  • vertical: 16 addresses separated by stride bytes are used, also naturally aligned:

    def addresses_vertical(addr, stride):
        addr &= 0x1fff
        # clear the bits used for y coord
        addr &= ~(0xf << (4 + stride))
        return [address_xlat(addr | idx << (4 + stride)) for idx in range(16)]
    
  • scalar: like horizontal, but 4 bytes:

    def addresses_horizontal_short(addr, stride):
        addr &= 0x1ffc
        return [address_xlat(addr | idx, stride) for idx in range(4)]
    
  • raw: the raw data store coordinates are provided directly

Address registers

The address unit has 32 address registers, $a0-$a31. These are used for address storage. If they’re used to store data store addresses (and not DMA command parameters), they have the following bitfields:

  • bits 0-15: addr - the actual data store address
  • bits 16-29: limit - can store the high bounduary of an array, to assist in looping
  • bits 30-31: stride - selects data store stride:
    • 0: 0x10 bytes
    • 1: 0x20 bytes
    • 2: 0x40 bytes
    • 3: 0x80 bytes

There are also 3 bits in each $c register belonging to the address unit. They are:

  • bits 8-9: long address flags
    • bit 8: sign flag - set equal to bit 31 of the result
    • bit 9: zero flag - set if the result is 0
  • bit 10: short address flag
    • bit 10: end flag - set if addr field of the result is greater than or equal to limit

Some address instructions set either the long or short flags of a given $c register according to the result.

Instruction format

The instruction word fields used in address instructions in addition to the ones used in scalar instructions are:

Todo

list me

Instructions

Set low/high bits: setlo, sethi

Sets low or high 16 bits of a register to an immediate value. The other half is unaffected.

Instructions:
Instruction Operands Opcode
setlo $a[DST] IMM16 0xcc
sethi $a[DST] IMM16 0xcd
Operation:
if op == 'setlo':
    $a[DST] = ($a[DST] & 0xffff0000) | IMM16
else:
    $a[DST] = ($a[DST] & 0xffff) | IMM16 << 16

Addition: add

Does what it says on the tin. The second source comes from a mangled register index. The long address flags are set.

Instructions:
Instruction Operands Opcode
add [$c[CDST]] $a[DST] $a[SRC1] $a[SRC2S] 0xcb
Operation:
res = $a[SRC1] + $a[SRC2S]

$a[DST] = res

cres = 0
if res & 1 << 31:
    cres |= 1
if res == 0:
    cres |= 2
if CDST < 4:
    $c[CDST].address.long = cres

Bit operations: bitop

Performs an arbitrary two-input bit operation on two registers, selected by SRC1 and SRC2. The long address flags are set.

Instructions:
Instruction Operands Opcode
bitop BITOP [$c[CDST]] $a[DST] $a[SRC1] $a[SRC2] 0xd3
Operation:
res = bitop(BITOP, $a[SRC2], $a[SRC1]) & 0xffffffff

$a[DST] = res

cres = 0
if res & 1 << 31:
    cres |= 1
if res == 0:
    cres |= 2
if CDST < 4:
    $c[CDST].address.long = cres

Address addition: aadd

Adds the contents of a register to the addr field of another register. Short address flag is set.

Instructions:
Instruction Operands Opcode
aadd [$c[CDST]] $a[DST] $a[SRC2S] 0xca
Operation:
$a[DST].addr += $a[SRC2S]

if CDST < 4:
    $c[CDST].address.short = $a[DST].addr >= $a[DST].limit

Load: ldvh, ldvv, lds

Loads from the given address ORed with an unsigned 11-bit immediate. ldvh is a horizontal vector load, ldvv is a vertical vector load, and lds is a scalar load. Curiously, while register is ORed with the immdiate to form the address, they are added to make $c output.

Instructions:
Instruction Operands Opcode
ldvh $v[DST] [$c[CDST]] $a[SRC1] UIMM 0xd8
ldvv $v[DST] [$c[CDST]] $a[SRC1] UIMM 0xd9
lds $r[DST] [$c[CDST]] $a[SRC1] UIMM 0xda
Operation:
if op == 'ldvh':
    addr = addresses_horizontal($a[SRC1].addr | UIMM, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'ldvv':
    addr = addresses_vertical($a[SRC1].addr | UIMM, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'lds':
    addr = addresses_scalar($a[SRC1].addr | UIMM, $a[SRC1].stride)
    for idx in range(4):
        $r[DST][idx] = DS[addr[idx]]

if CDST < 4:
    $c[CDST].address.short = (($a[SRC1].addr + UIMM) & 0xffff) >= $a[SRC1].limit

Load and add: ldavh, ldavv, ldas

Loads from the given address, then post-increments the address by the contents of a register (like the aadd instruction) or an immediate. ldavh is a horizontal vector load, ldavv is a vertical vector load, and ldas is a scalar load.

Instructions:
Instruction Operands Opcode
ldavh $v[DST] [$c[CDST]] $a[SRC1] $a[SRC2S] 0xc0
ldavv $v[DST] [$c[CDST]] $a[SRC1] $a[SRC2S] 0xc1
ldas $r[DST] [$c[CDST]] $a[SRC1] $a[SRC2S] 0xc2
ldavh $v[DST] [$c[CDST]] $a[SRC1] IMM 0xd0
ldavv $v[DST] [$c[CDST]] $a[SRC1] IMM 0xd1
ldas $r[DST] [$c[CDST]] $a[SRC1] IMM 0xd2
Operation:
if op == 'ldavh':
    addr = addresses_horizontal($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'ldavv':
    addr = addresses_vertical($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'ldas':
    addr = addresses_scalar($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(4):
        $r[DST][idx] = DS[addr[idx]]

if IMM is None:
    $a[SRC1].addr += $a[SRC2S]
else:
    $a[SRC1].addr += IMM

if CDST < 4:
    $c[CDST].address.short = $a[SRC1].addr >= $a[SRC1].limit

Store: stvh, stvv, sts

Like corresponding ld* instructions, but store instead of load. SRC1 and DST fields are exchanged.

Instructions:
Instruction Operands Opcode
stvh $v[SRC1] [$c[CDST]] $a[DST] UIMM 0xdc
stvv $v[SRC1] [$c[CDST]] $a[DST] UIMM 0xdd
sts $r[SRC1] [$c[CDST]] $a[DST] UIMM 0xde
Operation:
if op == 'stvh':
    addr = addresses_horizontal($a[DST].addr | UIMM, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'stvv':
    addr = addresses_vertical($a[DST].addr | UIMM, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'sts':
    addr = addresses_scalar($a[DST].addr | UIMM, $a[DST].stride)
    for idx in range(4):
        DS[addr[idx]] = $r[SRC1][idx]

if CDST < 4:
    $c[CDST].address.short = (($a[DST].addr + UIMM) & 0xffff) >= $a[DST].limit

Store and add: stavh, stavv, stas

Like corresponding lda* instructions, but store instead of load. SRC1 and DST fields are exchanged.

Instructions:
Instruction Operands Opcode
stavh $v[SRC1] [$c[CDST]] $a[DST] $a[SRC2S] 0xc4
stavv $v[SRC1] [$c[CDST]] $a[DST] $a[SRC2S] 0xc5
stas $r[SRC1] [$c[CDST]] $a[DST] $a[SRC2S] 0xc6
stavh $v[SRC1] [$c[CDST]] $a[DST] IMM 0xd4
stavv $v[SRC1] [$c[CDST]] $a[DST] IMM 0xd5
stas $r[SRC1] [$c[CDST]] $a[DST] IMM 0xd6
Operation:
if op == 'stavh':
    addr = addresses_horizontal($a[DST].addr, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'stavv':
    addr = addresses_vertical($a[DST].addr, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'stas':
    addr = addresses_scalar($a[DST].addr, $a[DST].stride)
    for idx in range(4):
        DS[addr[idx]] = $r[SRC1][idx]

if IMM is None:
    $a[DST].addr += $a[SRC2S]
else:
    $a[DST].addr += IMM

if CDST < 4:
    $c[CDST].address.short = $a[DST].addr >= $a[DST].limit

Load raw: ldr

A raw load instruction. Loads one byte from each bank of the data store. The banks correspond directly to destination register components. The addresses are composed from ORing an address register with components of a vector register shifted left by 4 bits. Specifically, for each component, the byte to access is determined as follows:

  • take address register value
  • shift it right 4 bits (they’re discarded)
  • OR with the corresponding component of vector source register
  • bit 0 of the result selects low/high byte of the bank
  • bits 1-8 of the result select the cell index in the bank

This instruction shares the 0xd7 opcode with star. They are differentiated by instruction word bit 0, set to 0 in case of ldr.

Instructions:
Instruction Operands Opcode
ldr $v[DST] $a[SRC1] $v[SRC2] 0xd7.0
Operation:
for idx in range(16):
    addr = $a[SRC1].addr >> 4 | $v[SRC2][idx]
    $v[DST][idx] = DS[idx, addr >> 1 & 0xff, addr & 1]

Store raw and add: star

A raw store instruction. Stores one byte to each bank of the data store. As opposed to raw load, the addresses aren’t controllable per component: the same byte and cell index is accessed in each bank, and it’s selected by post-incremented address register like for sta*. $c output is not supported.

This instruction shares the 0xd7 opcode with lda. They are differentiated by instruction word bit 0, set to 1 in case of star.

Instructions:
Instruction Operands Opcode
star $v[SRC1] $a[DST] $a[SRC2S] 0xd7.1
Operation:
for idx in range(16):
    addr = $a[DST].addr >> 4
    DS[idx, addr >> 1 & 0xff, addr & 1] = $v[SRC1][idx]

$a[DST].addr += $a[SRC2S]

Load extra and add: ldaxh, ldaxv

Like ldav*, except the data is loaded to $vx. If a selected $c flag is set (the same one as used for SRC2S mangling), the same data is also loaded to a $v register selected by DST field mangled in the same way as in vlrp2 family of instructions.

Instructions:
Instruction Operands Opcode
ldaxh $v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S] 0xc8
ldaxv $v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S] 0xc9
Operation:
if op == 'ldaxh':
    addr = addresses_horizontal($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $vx[idx] = DS[addr[idx]]
elif op == 'ldaxv':
    addr = addresses_vertical($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $vx[idx] = DS[addr[idx]]

if $c[COND] & 1 << SLCT:
    for idx in range(16):
        $v[(DST & 0x1c) | ((DST + ($c[COND] >> 4)) & 3)][idx] = $vx[idx]

$a[SRC1].addr += $a[SRC2S]

if CDST < 4:
    $c[CDST].address.short = $a[SRC1].addr >= $a[SRC1].limit