Overview of VP1 video processor ¶

Contents

Overview of VP1 video processor

Introduction ¶

VP1 is basically a vector processor with PFIFO interface and custom ISA designed for video decoding. It’s made of the following units:

the processor proper:
8kb of dedicated data RAM
instruction cache
memory interface
DMA engine
FIFO interface

There are 3 variants of VP1:

the original NV41 variant [NV41:NV44]
the NV44 variant, with improved context switching and method processing capabilities [NV44:G80]
the G80 variant, with changes to match G80 memory model [G80:G84]

MMIO registers ¶

8-bit space pvp1 [0x1000]¶
nv3-mmio 0xf000: PVP1 [NV41:G80]
g80-mmio 0xf000: PVP1 [VP1]: Todo

write me

Instruction format ¶

On VP1, all instructions are stored as 32-bit little-endian words. The top 8 bits of the instruction word determine the opcode, and the highest bits of the opcode determine its execution unit:

0x00-0x7f: scalar instructions
0x80-0xbf: vector instructions
0xc0-0xdf: address instructions
0xe0-0xff: branch instructions

The executed instruction stream is divided into so-called instruction bundles, which are groups of up to 4 instructions targetting distinct execution units. The instructions of a bundle are executed in parallel - they don’t see each others’ changes to register state. In some cases, however, the scalar instruction of a bundle computes data to be used by the vector instruction of the same bundle (so-called s2v path).

The instructions are grouped into bundles as follows:

bundles cannot cross aligned 4 word (16 byte) bounduaries
a bundle can contain an arbitrary nonempty subset of (address, scalar, vector, branch) instructions, in that order
bundles are as long as possible, subject to the previous restrictions

In other words, an instruction word starts a new bundle iff:

the instruction starts on aligned 4-word bounduary, or
the current bundle already contains an instruction of this kind, or a higher kind (where branch > vector > scalar > address)

For example, the following splits happen:

|A|A|A|A|A|A|A|A| - if only instructions of a single kind are fetched, each executes as one bundle
|A S V B|A S V B| - the perfect case, 4-bundle instructions
|A V|S B|S|A V B| - if instructions are in the wrong order, they won’t make a bundle
|A|A|A S|V B|B|B| - the 4 middle instructions can’t be in a single bundle, because there is a 4-word bounduary in the middle
|B|V|S|A|B|V|S|A| - worst case, all instructions in wrong order

Registers ¶

VP1 has 15 distinct register files:

$r0-$r30 (with $r31 hardwired to 0): 32-bit scalar registers, sometimes treated as groups of 4 bytes for SIMD instructions
$v0-$v31: 128-bit vector registers, treated as groups of 16 bytes for SIMD instructions
$a0-$a31: 32-bit address registers, they have funny bitfields used for memory addressing, looping, and mode selection
$c0-$c3: 16-bit condition code registers, split into individual bits belonging to one of the four execution units, used for branching and conditionally selecting inputs
$vc0-$vc3: 32-bit vector condition code registers, like $c, but with different fields, and each bit is duplicated 16 times (one for each vector component)
$va, 448-bit vector accumulator, split into 16 components, each with 12 integer and 16 fractional bits
$vx, 128-bit vector extra register, split into 16 components like normal $v
$l0-$l3: 16-bit loop registers, split into 8-bit loop counter and 8-bit loop total count
$m0-$m63: 32-bit method registers
$x0-$x15: 32-bit extra registers (G80 only)
$d0-$d7: 17-bit DMA object registers (G80 only)
$f0-$f1: FIFO special registers
$sr0-$sr31: misc special registers
$mi0-$mi31: memory interface special registers
$uc0-$uc31: processor control special registers

Todo

incomplete for <G80

Condition code registers ¶

There are 4 condition code registers, $c0-$c3. Each of them has the following bitfields:

bits 0-7: scalar flags
bits 8-10: address flags
bits 11-12: unused, always 0
bit 13: branch flag
bit 14: always 0
bit 15: always 1

one instruction of a bundle writes a register, another reads it: in this case, the old value is read (every instruction reads all its inputs as they were before the whole bundle started executing).
more than one instruction of a bundle writes a single register:
- for a multiply written $r register, the first of the following wins:
  - scalar instruction, except mov from $c, $v, $a, $l (but including mov from $m and $x)
  - address instruction (load to $r)
  - scalar mov from $c, $v, $a, $l
- for $v:
  - vector instruction
  - address instruction (load to $v)
  - scalar mov to $v
- for $a:
  - scalar mov to $a
  - address instruction
- for $l:
  - branch instruction
  - scalar mov to $l
Relying on any of that is probably a very bad idea.
two instructions of a bundle fight for a shared read port, and one of them wins:
- if both the scalar (mov from $v) and address (store $v or ldr) instructions read from $v registers, the $v register read by the address instruction is forced to be the one used by the scalar instruction
- if the scalar instruction is a mov to other register file, and the address instruction reads from a $r register (store $r), the $r register read by the scalar instruction is forced to the one used by the address instruction
Relying on that is also a very bad idea. Avoid issuing such bundles.
if the scalar instruction is a mov from $l, and the branch instruction is exit, the $r register won’t be written.

Needless to say, don’t do that.

Todo

mov from $sr, $uc, $mi, $f, $d

Overview of VP1 video processor ¶

Introduction ¶

MMIO registers ¶

Interrupts ¶

Instruction fetching ¶

Instruction format ¶

Registers ¶

Condition code registers ¶

Special registers ¶

Extra registers ¶

Instruction conflicts ¶

Table of Contents

Previous topic

Next topic

This Page