Address unit ¶

Contents

Address unit

Introduction ¶

The address unit is one of the four execution units of VP1. It transfers data between that data store and registers, controls the DMA unit, and performs address calculations.

The data store is the working memory of VP1, 8kB in size. Data can be transferred between the data store and $r/$v registers using load/store instructions, or between the data store and main memory using the DMA engine. It’s often treated as two-dimensional, with row stride selectable between 0x10, 0x20, 0x40, and 0x80 bytes: there are “load vertical” instructions which gather consecutive bytes vertically rather than horizontally.

Because of its 2D capabilities, the data store is internally organized into 16 independently addressable 16-bit wide banks of 256 cells each, and the memory addresses are carefully spread between the banks so that both horizontal and vertical loads from any address will require at most one access to every bank. The bank assignments differ between the supported strides, so row stride is basically a part of the address, and an area of memory always has to be accessed with the same stride (unless you don’t care about its previous contents). Specifially, the translation of (address, stride) pair into (bank, cell index, high/low byte) is as follows:

def address_xlat(addr, stride):
    bank = addr & 0xf
    hilo = addr >> 4 & 1
    cell = addr >> 5 & 0xff
    if stride == 0:
        # 0x10 bytes
        bank += (addr >> 5) & 7
    elif stride == 1:
        # 0x20 bytes
        bank += addr >> 5
    elif stride == 0x40:
        # 0x40 bytes
        bank += addr >> 6
    elif stride == 0x80:
        # 0x80 bytes
        bank += addr >> 7
    bank &= 0xf
    return bank, cell, hilo

In pseudocode, data store bytes are denoted by DS[bank, cell, hilo].

In case of vertical access with 0x10 bytes stride, all 16 bits of 8 banks will be used by a 16-byte access. In all other cases, 8 bits of all 16 banks will be used for such access. DMA transfers can make use of the full 256-bit width of the data store, by transmitting 0x20 consecutive bytes at a time.

The data store can be accessed by load/store instructions in one of four ways:

horizontal: 16 consecutive naturally aligned addresses are used:

def addresses_horizontal(addr, stride):
    addr &= 0x1ff0
    return [address_xlat(addr | idx, stride) for idx in range(16)]

vertical: 16 addresses separated by stride bytes are used, also naturally aligned:

def addresses_vertical(addr, stride):
    addr &= 0x1fff
    # clear the bits used for y coord
    addr &= ~(0xf << (4 + stride))
    return [address_xlat(addr | idx << (4 + stride)) for idx in range(16)]

scalar: like horizontal, but 4 bytes:

def addresses_horizontal_short(addr, stride):
    addr &= 0x1ffc
    return [address_xlat(addr | idx, stride) for idx in range(4)]

raw: the raw data store coordinates are provided directly

Address registers ¶

The address unit has 32 address registers, $a0-$a31. These are used for address storage. If they’re used to store data store addresses (and not DMA command parameters), they have the following bitfields:

bits 0-15: addr - the actual data store address
bits 16-29: limit - can store the high bounduary of an array, to assist in looping
bits 30-31: stride - selects data store stride:
- 0: 0x10 bytes
- 1: 0x20 bytes
- 2: 0x40 bytes
- 3: 0x80 bytes

There are also 3 bits in each $c register belonging to the address unit. They are:

bits 8-9: long address flags
- bit 8: sign flag - set equal to bit 31 of the result
- bit 9: zero flag - set if the result is 0
bit 10: short address flag
- bit 10: end flag - set if addr field of the result is greater than or equal to limit

Some address instructions set either the long or short flags of a given $c register according to the result.

Instruction format ¶

The instruction word fields used in address instructions in addition to the ones used in scalar instructions are:

bit 0: for opcode 0xd7, selects the subopcode:
- 0: load raw: ldr
- 1: store raw and add: star
bits 3-13: UIMM: unsigned 13-bit immediate.

Todo

list me

Opcodes ¶

The opcode range assigned to the address unit is 0xc0-0xdf. The opcodes are:

0xc0: load vector horizontal and add: ldavh
0xc1: load vector vertical and add: ldavv
0xc2: load scalar and add: ldas
0xc3: ??? (xdld)
0xc4: store vector horizontal and add: stavh
0xc5: store vector vertical and add: stavv
0xc6: store scalar and add: stas
0xc7: ??? (xdst)
0xc8: load extra horizontal and add: ldaxh
0xc9: load extra vertical and add: ldaxv
0xca: address addition: aadd
0xcb: addition: add
0xcc: set low bits: setlo
0xcd: set high bits: sethi
0xce: ??? (xdbar)
0xcf: ??? (xdwait)
0xd0: load vector horizontal and add: ldavh
0xd1: load vector vertical and add: ldavv
0xd2: load scalar and add: ldas
0xd3: bitwise operation: bitop
0xd4: store vector horizontal and add: stavh
0xd5: store vector vertical and add: stavv
0xd6: store scalar and add: stas
0xd7: depending on instruction bit 0:
- 0: load raw: ldr
- 1: store raw and add: star
0xd8: load vector horizontal: ldvh
0xd9: load vector vertical: ldvv
0xda: load scalar: lds
0xdb: ???
0xdc: store vector horizontal: stvh
0xdd: store vector vertical: stvv
0xde: store scalar: sts
0xdf: the canonical address nop opcode

Todo

complete the list

Instructions ¶

Set low/high bits: setlo, sethi ¶

Sets low or high 16 bits of a register to an immediate value. The other half is unaffected.

Instructions:

Instruction	Operands	Opcode
`setlo`	`$a[DST] IMM16`	`0xcc`
`sethi`	`$a[DST] IMM16`	`0xcd`

Operation:

if op == 'setlo':
    $a[DST] = ($a[DST] & 0xffff0000) | IMM16
else:
    $a[DST] = ($a[DST] & 0xffff) | IMM16 << 16

Addition: add ¶

Does what it says on the tin. The second source comes from a mangled register index. The long address flags are set.

Instructions:

Instruction	Operands	Opcode
`add`	`[$c[CDST]] $a[DST] $a[SRC1] $a[SRC2S]`	`0xcb`

Operation:

res = $a[SRC1] + $a[SRC2S]

$a[DST] = res

cres = 0
if res & 1 << 31:
    cres |= 1
if res == 0:
    cres |= 2
if CDST < 4:
    $c[CDST].address.long = cres

Bit operations: bitop ¶

Performs an arbitrary two-input bit operation on two registers, selected by SRC1 and SRC2. The long address flags are set.

Instructions:

Instruction	Operands	Opcode
`bitop`	`BITOP [$c[CDST]] $a[DST] $a[SRC1] $a[SRC2]`	`0xd3`

Operation:

res = bitop(BITOP, $a[SRC2], $a[SRC1]) & 0xffffffff

$a[DST] = res

cres = 0
if res & 1 << 31:
    cres |= 1
if res == 0:
    cres |= 2
if CDST < 4:
    $c[CDST].address.long = cres

Address addition: aadd ¶

Adds the contents of a register to the addr field of another register. Short address flag is set.

Instructions:

Instruction	Operands	Opcode
`aadd`	`[$c[CDST]] $a[DST] $a[SRC2S]`	`0xca`

Operation:

$a[DST].addr += $a[SRC2S]

if CDST < 4:
    $c[CDST].address.short = $a[DST].addr >= $a[DST].limit

Load: ldvh, ldvv, lds ¶

Loads from the given address ORed with an unsigned 11-bit immediate. ldvh is a horizontal vector load, ldvv is a vertical vector load, and lds is a scalar load. Curiously, while register is ORed with the immdiate to form the address, they are added to make $c output.

Instructions:

Instruction	Operands	Opcode
`ldvh`	`$v[DST] [$c[CDST]] $a[SRC1] UIMM`	`0xd8`
`ldvv`	`$v[DST] [$c[CDST]] $a[SRC1] UIMM`	`0xd9`
`lds`	`$r[DST] [$c[CDST]] $a[SRC1] UIMM`	`0xda`

Operation:

if op == 'ldvh':
    addr = addresses_horizontal($a[SRC1].addr | UIMM, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'ldvv':
    addr = addresses_vertical($a[SRC1].addr | UIMM, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'lds':
    addr = addresses_scalar($a[SRC1].addr | UIMM, $a[SRC1].stride)
    for idx in range(4):
        $r[DST][idx] = DS[addr[idx]]

if CDST < 4:
    $c[CDST].address.short = (($a[SRC1].addr + UIMM) & 0xffff) >= $a[SRC1].limit

Load and add: ldavh, ldavv, ldas ¶

Loads from the given address, then post-increments the address by the contents of a register (like the aadd instruction) or an immediate. ldavh is a horizontal vector load, ldavv is a vertical vector load, and ldas is a scalar load.

Instructions:

Instruction	Operands	Opcode
`ldavh`	`$v[DST] [$c[CDST]] $a[SRC1] $a[SRC2S]`	`0xc0`
`ldavv`	`$v[DST] [$c[CDST]] $a[SRC1] $a[SRC2S]`	`0xc1`
`ldas`	`$r[DST] [$c[CDST]] $a[SRC1] $a[SRC2S]`	`0xc2`
`ldavh`	`$v[DST] [$c[CDST]] $a[SRC1] IMM`	`0xd0`
`ldavv`	`$v[DST] [$c[CDST]] $a[SRC1] IMM`	`0xd1`
`ldas`	`$r[DST] [$c[CDST]] $a[SRC1] IMM`	`0xd2`

Operation:

if op == 'ldavh':
    addr = addresses_horizontal($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'ldavv':
    addr = addresses_vertical($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $v[DST][idx] = DS[addr[idx]]
elif op == 'ldas':
    addr = addresses_scalar($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(4):
        $r[DST][idx] = DS[addr[idx]]

if IMM is None:
    $a[SRC1].addr += $a[SRC2S]
else:
    $a[SRC1].addr += IMM

if CDST < 4:
    $c[CDST].address.short = $a[SRC1].addr >= $a[SRC1].limit

Store: stvh, stvv, sts ¶

Like corresponding ld* instructions, but store instead of load. SRC1 and DST fields are exchanged.

Instructions:

Instruction	Operands	Opcode
`stvh`	`$v[SRC1] [$c[CDST]] $a[DST] UIMM`	`0xdc`
`stvv`	`$v[SRC1] [$c[CDST]] $a[DST] UIMM`	`0xdd`
`sts`	`$r[SRC1] [$c[CDST]] $a[DST] UIMM`	`0xde`

Operation:

if op == 'stvh':
    addr = addresses_horizontal($a[DST].addr | UIMM, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'stvv':
    addr = addresses_vertical($a[DST].addr | UIMM, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'sts':
    addr = addresses_scalar($a[DST].addr | UIMM, $a[DST].stride)
    for idx in range(4):
        DS[addr[idx]] = $r[SRC1][idx]

if CDST < 4:
    $c[CDST].address.short = (($a[DST].addr + UIMM) & 0xffff) >= $a[DST].limit

Store and add: stavh, stavv, stas ¶

Like corresponding lda* instructions, but store instead of load. SRC1 and DST fields are exchanged.

Instructions:

Instruction	Operands	Opcode
`stavh`	`$v[SRC1] [$c[CDST]] $a[DST] $a[SRC2S]`	`0xc4`
`stavv`	`$v[SRC1] [$c[CDST]] $a[DST] $a[SRC2S]`	`0xc5`
`stas`	`$r[SRC1] [$c[CDST]] $a[DST] $a[SRC2S]`	`0xc6`
`stavh`	`$v[SRC1] [$c[CDST]] $a[DST] IMM`	`0xd4`
`stavv`	`$v[SRC1] [$c[CDST]] $a[DST] IMM`	`0xd5`
`stas`	`$r[SRC1] [$c[CDST]] $a[DST] IMM`	`0xd6`

Operation:

if op == 'stavh':
    addr = addresses_horizontal($a[DST].addr, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'stavv':
    addr = addresses_vertical($a[DST].addr, $a[DST].stride)
    for idx in range(16):
        DS[addr[idx]] = $v[SRC1][idx]
elif op == 'stas':
    addr = addresses_scalar($a[DST].addr, $a[DST].stride)
    for idx in range(4):
        DS[addr[idx]] = $r[SRC1][idx]

if IMM is None:
    $a[DST].addr += $a[SRC2S]
else:
    $a[DST].addr += IMM

if CDST < 4:
    $c[CDST].address.short = $a[DST].addr >= $a[DST].limit

Load raw: ldr ¶

A raw load instruction. Loads one byte from each bank of the data store. The banks correspond directly to destination register components. The addresses are composed from ORing an address register with components of a vector register shifted left by 4 bits. Specifically, for each component, the byte to access is determined as follows:

take address register value
shift it right 4 bits (they’re discarded)
OR with the corresponding component of vector source register
bit 0 of the result selects low/high byte of the bank
bits 1-8 of the result select the cell index in the bank

This instruction shares the 0xd7 opcode with star. They are differentiated by instruction word bit 0, set to 0 in case of ldr.

Instructions:

Instruction	Operands	Opcode
`ldr`	`$v[DST] $a[SRC1] $v[SRC2]`	`0xd7.0`

Operation:

for idx in range(16):
    addr = $a[SRC1].addr >> 4 | $v[SRC2][idx]
    $v[DST][idx] = DS[idx, addr >> 1 & 0xff, addr & 1]

Store raw and add: star ¶

A raw store instruction. Stores one byte to each bank of the data store. As opposed to raw load, the addresses aren’t controllable per component: the same byte and cell index is accessed in each bank, and it’s selected by post-incremented address register like for sta*. $c output is not supported.

This instruction shares the 0xd7 opcode with lda. They are differentiated by instruction word bit 0, set to 1 in case of star.

Instructions:

Instruction	Operands	Opcode
`star`	`$v[SRC1] $a[DST] $a[SRC2S]`	`0xd7.1`

Operation:

for idx in range(16):
    addr = $a[DST].addr >> 4
    DS[idx, addr >> 1 & 0xff, addr & 1] = $v[SRC1][idx]

$a[DST].addr += $a[SRC2S]

Load extra and add: ldaxh, ldaxv ¶

Like ldav*, except the data is loaded to $vx. If a selected $c flag is set (the same one as used for SRC2S mangling), the same data is also loaded to a $v register selected by DST field mangled in the same way as in vlrp2 family of instructions.

Instructions:

Instruction	Operands	Opcode
`ldaxh`	`$v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S]`	`0xc8`
`ldaxv`	`$v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S]`	`0xc9`

Operation:

if op == 'ldaxh':
    addr = addresses_horizontal($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $vx[idx] = DS[addr[idx]]
elif op == 'ldaxv':
    addr = addresses_vertical($a[SRC1].addr, $a[SRC1].stride)
    for idx in range(16):
        $vx[idx] = DS[addr[idx]]

if $c[COND] & 1 << SLCT:
    for idx in range(16):
        $v[(DST & 0x1c) | ((DST + ($c[COND] >> 4)) & 3)][idx] = $vx[idx]

$a[SRC1].addr += $a[SRC2S]

if CDST < 4:
    $c[CDST].address.short = $a[SRC1].addr >= $a[SRC1].limit

Address unit ¶

Introduction ¶

The data store ¶

Address registers ¶

Instruction format ¶

Opcodes ¶

Instructions ¶

Set low/high bits: setlo, sethi ¶

Addition: add ¶

Bit operations: bitop ¶

Address addition: aadd ¶

Load: ldvh, ldvv, lds ¶

Load and add: ldavh, ldavv, ldas ¶

Store: stvh, stvv, sts ¶

Store and add: stavh, stavv, stas ¶

Load raw: ldr ¶

Store raw and add: star ¶

Load extra and add: ldaxh, ldaxv ¶

Table of Contents

Previous topic

Next topic

This Page